Adding search to my static blog.
I finally added search to this blog 1.
This is cool because it keeps searches private and offline once the page loads. No third‑party boxes, no tracking, just a tiny index and a small script. So I can keep hosting my blog on Github Pages and have a search service for free.
Credit where due: I was inspired by Vicki’s post on client‑side search with Lunr. She did it on Hugo; I adapted the approach to Jekyll. In Jekyll, generating the data files that Lunr needs is trivial with Liquid and can be done at build time, which made this especially easy.
Press Cmd+K (or Ctrl+K on Windows/Linux) anywhere, or click the search link in the navbar. It’s fast, works offline once loaded, and doesn’t send your queries to any server.
What I built
- Client-side search: powered by Lunr.js (BM25 ranking, stemming, stop-words).
- Index at build time: Jekyll emits a compact document list that Lunr indexes in the browser.
- Simple UI: a modal with keyboard navigation, result count, and highlighted snippets.
This gives quick results, sensible ranking, and zero backend complexity.
How it works
The goal is to keep everything static. Jekyll builds the data; the browser builds the index and handles queries. No services, no endpoints.
Build-time index (Jekyll → JSON-like docs → Lunr)
At build time, Jekyll + Liquid loops over site.posts
and emits a tiny JavaScript array of post metadata and plain text. That array ships with the page as window.documents
. On page load, Lunr reads those docs and builds the index entirely in your browser. No endpoints, no JSON file required (though you can also output /search.json
if you prefer).
Here you can see an example of the documents that Jekyll generates at build time.
Documents emitted at build time
window.documents = [
{
"id": "0",
"title": "Adding search to my static blog.",
"url": "/2025/08/11/search-with-lunr.html",
"date": "August 11, 2025",
"excerpt": "I finally added search to this blog 1.\n\n\n \n \n Yes, I admit it, I vibe coded it. ↩\n \n \n\n",
"content": "I finally added search to this blog [^1].\n\nThis is cool because it keeps searches private and offline once the page loads. No third‑party boxes, no tracking, just a tiny index and a small script. So I can keep hosting my blog on Github Pages and have a search service for free.\n\nCredit where due: I was inspired by [Vicki's post on client‑side search with Lunr](https://vickiboykis.com/2025/08/08/enabling-hugo-static-site-search-with-lunr.js/). She did it on Hugo; I adapted the approach to Jekyll. In Jekyll, generating the data files that Lunr needs is trivial with Liquid and can be done at build time, which made this especially easy.\n\nPress Cmd+K (or Ctrl+K on Windows/Linux) anywhere, or click the search link in the navbar. It’s fast, works offline once loaded, and doesn’t send your queries to any server.\n\n## What I built\n- **Client-side search**: powered by Lunr.js (BM25 ranking, stemming, stop-words).\n- **Index at build time**: Jekyll emits a compact document list that Lunr indexes in the browser.\n- **Simple UI**: a modal with keyboard navigation, result count, and highlighted snippets.\n\nThis gives quick results, sensible ranking, and zero backend complexity.\n\n## How it works\n\nThe goal is to keep everything static. Jekyll builds the data; the browser builds the index and handles queries. No services, no endpoints.\n\n### Build-time index (Jekyll → JSON-like docs → Lunr)\n\nAt build time, Jekyll + Liquid loops over `site.posts` and emits a tiny JavaScript array of post metadata and plain text. That array ships with the page as `window.documents`. On page load, Lunr reads those docs and builds the index entirely in your browser. No endpoints, no JSON file required (though you can also output `/search.json` if you prefer).\n\nHere you can see an example of the documents that Jekyll generates at build time.\n\n\nDocuments emitted at build time\n\n{% highlight html %}\nwindow.documents = [\n {% for post in site.posts limit:5 %}\n {\n \"id\": \"{{ forloop.index0 }}\",\n \"title\": {{ post.title | jsonify }},\n \"url\": \"{{ post.url | relative_url }}\",\n \"date\": \"{{ post.date | date: '%B %d, %Y' }}\",\n \"excerpt\": {{ post.excerpt | strip_html | truncatewords: 30 | jsonify }},\n \"content\": {{ post.content | strip_html | jsonify }},\n \"tags\": {{ post.tags | join: \" \" | jsonify }}\n }{% unless forloop.last %},{% endunless %}\n {% endfor %}\n];\n{% endhighlight %}\n\n\n\nNow, using the information in `window.documents` we can build the index using Lunr. The next snippet show how to do it.\n\n```html\n\n\n```\n\nWhat’s happening here:\n- `this.ref('id')` sets the unique identifier for each document. Lunr returns this ref on search so we can map back to `window.documents`.\n- `this.field(...)` declares which fields to index and their relative importance via boosts (title > tags > excerpt > content).\n- `window.documents.forEach(... this.add(doc) ...)` feeds each post into the index. Passing `this` preserves the Lunr builder as the function context.\n\nThe result is an in‑memory index built once on page load. We query it with `window.searchIndex.search(query)` and then hydrate results from `window.documents`.\n\nThis keeps the site fully static: Jekyll renders the list, the browser builds the index. Compared to Hugo, this felt simpler for me: Liquid makes it easy to shape the data inline without extra templates or generators, and I don’t need a separate JSON step unless I want one. It also fits my hosting: GitHub Pages serves static files only, so I can’t add a server-side search service even if I wanted to.\n\n### Searching and serving results\nWhen the user types, I run a Lunr query and map the top results back to my `documents` list. Then I render a small card with title, date, and a contextual snippet with the matched terms highlighted.\n\n```javascript\nfunction performSearch(query) {\n if (!query.trim()) return [];\n const matches = window.searchIndex.search(query);\n return matches.slice(0, 10).map(m => {\n const doc = window.documents[parseInt(m.ref)];\n return { ...doc, score: m.score };\n });\n}\n```\n\n### Keyboard shortcut and modal\nSmall touches make it feel native: a quick modal, Cmd+K/Ctrl+K to open, arrows to navigate, Enter to jump.\n\n```javascript\n// Open with Cmd+K / Ctrl+K\ndocument.addEventListener('keydown', (e) => {\n if ((e.metaKey || e.ctrlKey) && e.key.toLowerCase() === 'k') {\n e.preventDefault();\n openSearchModal();\n }\n});\n\nfunction openSearchModal() {\n const modal = document.getElementById('searchModal');\n const input = document.getElementById('searchInput');\n modal.style.display = 'block';\n input.focus();\n}\n```\n\nThe UI also supports ESC to close, arrows to navigate, and Enter to open the selected result.\n\n## Why Lunr\nI didn’t want a service. Lunr is a tiny dependency that gives me real information‑retrieval behavior without infrastructure.\n- **Ranking that works**: BM25 beats my past ad‑hoc scoring.\n- **Zero deps at runtime**: no backend, no external service.\n- **Small & portable**: a single script and a tiny doc list.\n\n## Try it\n- Tap Cmd+K / Ctrl+K, or click the **search** link above.\n- Type anything you remember from a post: a tag, a phrase, a topic.\n\nIf something feels off (results, snippet, ranking), tell me—I’ll tweak the fields/boosts or snippets.\n\n## Next steps\n\nI'd like to add semantic search on client side too. We can compute embeddings at build time using a small embedding model (MiniLM/SentenceTransformers). Then embed the query at runtime and rank posts by cosine similarity.\n\n---\n\n[^1]: Yes, I admit it, I vibe coded it.",
"tags": "javascript jekyll"
},
{
"id": "1",
"title": "Who needs git when you have 1M context windows?",
"url": "/2025/07/28/unexpected-benefit-llm.html",
"date": "July 28, 2025",
"excerpt": "Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases. But in my case it was the other way around. I removed some...",
"content": "Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases. But in my case it was the other way around. I removed some working code and the LLM helped me to recover it.\n\n\n\nI joined RevenueCat a couple of months ago to work on LTV predictions. My first few projects were straightforward: fix small things, ship some low-hanging fruit. After getting familiar with the code, I decided to extend the LTV model and add some cool machine learning. So I dove in. Spent a couple of days wrangling data, cleaning it, understanding what was going on, and the usual standard pre-training stuff. I was in “research mode”, so all my code lived in notebooks and ugly scripts. But after enough trial and error, I managed to improve the main metric by 5%. I was hyped. Told the team. Felt great.\n\nThen came the cleanup. I refactored all the sketchy code into a clean Python package, added tests, formatted everything nicely, added type hints, and got it ready for production. Just before opening the PR, I ran the pipeline again to double-check everything… and the results had dropped by 2%.\n\nOh shit… My ML model was now making worse predictions than the original model… And I never committed the changes that got me the +5%. Noob mistake. My teammate wasted no time laughing at my blunder\n\n\n\n \n My colleague having a good fun\n\n\nI spent the next few days trying to reproduce the original results, but it was impossible. Whatever secret sauce I’d stumbled on was gone. Then the weekend came and I went to the beach with my kids, and while making sand-castles I had an epiphany. I wasn’t working alone while developing the ML model. There was someone else helping me: my good old friend gemini-2.5-pro, with an incredible 1M token context window, was there all the time. Maybe, just maybe, it remembered. On Monday, after a great and relaxing weeked, I opened Cursor and asked\n\n\n give me the exact original file of ml_ltv_training.py i passed you in the first message\n\n\nBoom. There it was, the original script that gave me the +5% uplift. An unexpected benefit of long-context LLMs. Who needs git best practices when you have an LLM that remembers everything?\n",
"tags": ""
},
{
"id": "2",
"title": "Semantic Unit Testing",
"url": "/2025/04/09/semantic-unit-testing.html",
"date": "April 09, 2025",
"excerpt": "Left Wallapop a couple of weeks ago, heading to RevenueCat soon. In that classic ‘between jobs’ hacking window, I built suite: a Python library for semantic unit testing. What’s semantic...",
"content": "Left Wallapop a couple of weeks ago, heading to RevenueCat soon. In that classic ‘between jobs’ hacking window, I built suite: a Python library for semantic unit testing. What’s semantic unit testing? Think unit tests that understand context and meaning, not just assert obj == expected. Sound interesting? I’ll break down what semantic unit testing is, how suite works under the hood, and how you can integrate it.\n\n\n\nWhat’s semantic unit testing?\n\nSemantic unit testing is a testing approach that evaluates whether a function’s implementation aligns with its documented behavior. The code is analyzed using LLMs to assess if the implementation matches the expected behavior described in the docstring. It’s basically having an AI review your code and documentation together to spot discrepancies or bugs, without running the code.\n\nThis is, instead of writing classic unit tests with pairs of (input, output), the testing responsibility is passed to an LLM. The hypothesis is that a powerful model, with good enough context, should be able to detect bugs without having to run the code. The idea is that an LLM can analyze the code and its documentation much like a human developer would -carefully reading and thinking hard about it- but much more quickly.\n\nI wrote a package for doing semantic testing called suite (Semantic UnIt TEsting). You can install it using uv as 1\n\nuv pip install suite\n\n\nHere’s an example of how to perform a basic semantic test with suite\n\nfrom suite import suite\n\ntester = suite(model_name=\"openai/o3-mini\")\n\ndef multiply(x: int, y: int) -> int:\n \"\"\"Multiplies x by y\n\n Args:\n x (int): value\n y (int): value\n Returns:\n int: value\n \"\"\"\n return x + y\n\nresult = tester(multiply)\nprint(result)\n\n# {'reasoning': \"The function's docstring states that it should multiply x by y. \n# However, the implementation returns x + y, which is addition instead of multiplication. \n# Therefore, the implementation does not correctly fulfill what is described in the docstring.\",\n# 'passed': False}\n\n\nBasically, we have a multiply function that we want to test. To do so we create a tester instance that will use o3-mini as a judge. Then, we pass the method multiply to the tester, which internally will build a prompt containing all the information about it. Finally, the LLM will decide if the method is correctly implemented or contains a bug.\n\nHow does it work?\n\nI’ve had this project on my todo list for a long time and I never had enough time and motivation to start it, however, some weeks ago Vincent released smartfunc (a library to turn docstrings into LLM-functions), and motivated me to start the project -and to be honest I borrowed some design choices from Vincent’s code. I also used llm library by Simon Willison to access different LLM providers easily.\n\nThe suite library does the following.\n\n\n Receives a callable func as input.\n Reads func implementation and docstring (using inspect library).\n Analyzes the implementation of func to identify and extract any functions or methods that it calls internally.\n Recursively applies steps 1, 2 and 3 to func inner methods (up to some max_depth).\n Builds a FunctionInfo object with all the information about func we need.\n Uses FunctionInfo to write a prompt which is passed to an LLM.\n The LLM returns a structured output like {\"reasoning\": str, \"passed\" bool}\n\n\nLet’s see now how all of these works with a concrete example. Imagine we have a method that we use to deal a deck of cards among some players. To do so we have a method called deal_cards which implementation is below.\n\n\nDealing cards code\n\n\nimport random\n\ndef shuffle_cards(cards: list[str]) -> list[str]:\n \"\"\"\n Returns a shuffled copy of the given list of cards.\n\n Parameters:\n cards (list[str]): A list of card identifiers (e.g., \"Ace of Spades\").\n\n Returns:\n list[str]: A new list containing the same cards in randomized order.\n \"\"\"\n shuffled = cards.copy()\n random.shuffle(shuffled)\n return shuffled\n\ndef split_cards(cards: list[str], number_of_players: int) -> list[list[str]]:\n \"\"\"\n Splits a list of cards evenly among a given number of players in a round-robin fashion.\n\n Parameters:\n cards (list[str]): The full list of cards to distribute.\n number_of_players (int): The number of players to split the cards between.\n\n Returns:\n list[list[str]]: A list where each sublist represents a player's hand of cards.\n Cards are distributed one at a time to each player in turn.\n \"\"\"\n return [cards[i::number_of_players] for i in range(number_of_players)]\n\ndef deal_cards(cards: list[str], number_of_players: int) -> list[list[str]]:\n \"\"\"\n Shuffles a deck of cards and deals them evenly among a given number of players.\n\n This function combines shuffling and splitting the deck to simulate a card deal.\n\n Parameters:\n cards (list[str]): The full list of cards to shuffle and distribute.\n number_of_players (int): The number of players to deal cards to.\n\n Returns:\n list[list[str]]: A list of player hands after shuffling and dealing.\n \"\"\"\n shuffled = shuffle_cards(cards)\n return split_cards(shuffled, number_of_players)\n\n\n\n\nWe want to test the method deal_cards, so the first thing we need is to retrieve all the information about this method. This includes its docstring, source code, dependencies, etc. To do so we can use the FunctionInfo class, which is a pydantic model that looks like this\n\nclass FunctionInfo(BaseModel):\n \"\"\"Information about a function extracted for semantic testing.\"\"\"\n name: str\n docstring: str | None\n source: str | None\n source_file: str | None\n dependencies: list[\"FunctionInfo\"] = []\n \n @classmethod\n def from_func(\n cls,\n func: Callable,\n max_depth: int = 2):\n ...\n\n\nThis class has information about the callable name, docstring, source code, etc. It also contains information about the method dependencies, which are a list of FunctionInfo, one for each method that the callable uses internally. This piece of information is key to building a good context for the LLM since it allows us to get information about the code outside the method we want to test. It also has a classmethod that allows you to build this object directly from a callable. The max_depth parameter controls how deep you want to inspect the dependencies (ie: dependencies, dependencies of dependencies, etc.)\n\nThen, we can run FunctionInfo.from_func(deal_cards) and we’ll have an object with all the information we need about deal_cards.\n\nNow, we need a method that receives an instance of FunctionInfo and returns a prompt that can be sent to an LLM\n\ndef format_prompt(\n func_info: FunctionInfo,\n prompt_template: str = DEFAULT_PROMPT_TEMPLATE,\n dependencies_template: str = DEFAULT_DEPENDENCY_TEMPLATE,\n) -> str:\n ...\n\n\nFinally, for the deal_cards method, the resulting prompt is\n\n\nFinal prompt\n\n\nYou are evaluating whether a function implementation correctly matches its docstring.\n\nFunction name: deal_cards\nDocstring: Shuffles a deck of cards and deals them evenly among a given number of players.\n\nThis function combines shuffling and splitting the deck to simulate a card deal.\n\nParameters:\n cards (list[str]): The full list of cards to shuffle and distribute.\n number_of_players (int): The number of players to deal cards to.\n\nReturns:\n list[list[str]]: A list of player hands after shuffling and dealing.\nImplementation: def deal_cards(cards: list[str], number_of_players: int) -> list[list[str]]:\n \"\"\"\n Shuffles a deck of cards and deals them evenly among a given number of players.\n\n This function combines shuffling and splitting the deck to simulate a card deal.\n\n Parameters:\n cards (list[str]): The full list of cards to shuffle and distribute.\n number_of_players (int): The number of players to deal cards to.\n\n Returns:\n list[list[str]]: A list of player hands after shuffling and dealing.\n \"\"\"\n shuffled = shuffle_cards(cards)\n return split_cards(shuffled, number_of_players)\n\nDependencies: \nDependency 1: shuffle_cards\nDocstring: Returns a shuffled copy of the given list of cards.\n\nParameters:\n cards (list[str]): A list of card identifiers (e.g., \"Ace of Spades\").\n\nReturns:\n list[str]: A new list containing the same cards in randomized order.\nImplementation: def shuffle_cards(cards: list[str]) -> list[str]:\n \"\"\"\n Returns a shuffled copy of the given list of cards.\n\n Parameters:\n cards (list[str]): A list of card identifiers (e.g., \"Ace of Spades\").\n\n Returns:\n list[str]: A new list containing the same cards in randomized order.\n \"\"\"\n shuffled = cards.copy()\n random.shuffle(shuffled)\n return shuffled\n\n\n\nDependency 1.1: method\nDocstring: Shuffle list x in place, and return None.\nImplementation: def shuffle(self, x):\n \"\"\"Shuffle list x in place, and return None.\"\"\"\n\n randbelow = self._randbelow\n for i in reversed(range(1, len(x))):\n # pick an element in x[:i+1] with which to exchange x[i]\n j = randbelow(i + 1)\n x[i], x[j] = x[j], x[i]\n\n\n\nDependency 2: split_cards\nDocstring: Splits a list of cards evenly among a given number of players in a round-robin fashion.\n\nParameters:\n cards (list[str]): The full list of cards to distribute.\n number_of_players (int): The number of players to split the cards between.\n\nReturns:\n list[list[str]]: A list where each sublist represents a player's hand of cards.\n Cards are distributed one at a time to each player in turn.\nImplementation: def split_cards(cards: list[str], number_of_players: int) -> list[list[str]]:\n \"\"\"\n Splits a list of cards evenly among a given number of players in a round-robin fashion.\n\n Parameters:\n cards (list[str]): The full list of cards to distribute.\n number_of_players (int): The number of players to split the cards between.\n\n Returns:\n list[list[str]]: A list where each sublist represents a player's hand of cards.\n Cards are distributed one at a time to each player in turn.\n \"\"\"\n return [cards[i::number_of_players] for i in range(number_of_players)]\n\n\n\nDoes the implementation correctly fulfill what is described in the docstring?\nRead the implementation carefully. Reason step by step and take your time.\n\n\n\n\nThe prompt includes information about the dependencies we implemented (shuffle_cards and split_cards) and also about external methods (random.shuffle).\n\nThen, this prompt is sent to an LLM which returns an object of type SuiteOutput\n\nclass SuiteOutput(BaseModel):\n reasoning: str\n passed: bool\n\n def __bool__(self):\n return self.passed\n\n\nIn my experience, models that support thinking and structured outputs yield better results. o3-mini works very well and it’s not crazy expensive.\n\nReasons to not use suite\n\nNow that you know how suite works let’s get serious: you shouldn’t substitute your tests with this tool.\n\nI know it’s weird to write this section in my post. I’m the author of the package and I should be publicizing it instead of explaining why is a bad idea to use it. But let’s be honest, we all know that LLMs are not the solution to every problem we have -and we already have enough AI influencers out there-, so I’ll try to be upfront and explain why I don’t think you should replace your unit tests with this approach.\n\nUse boring technology\n\nBoring technology works so instead of using the new and shiny library you should be using the old and tested approaches that have been there for decades. I know it’s cool to play with LLMs and CV-driven-development forces you to try new technologies to make you more employable. But we’re here to make good software, and sometimes you don’t need the new toy in the market to do so.\n\nIt can be expensive\n\nWhile developing suite I did some tests with the pandas library. In particular, I tested the pd.json_normalize method. With max_depth=0 (ie: the smallest amount of information) I got a prompt of 112k tokens (the first Harry Potter book has 100k tokens). And here I was just testing one method. Imagine if I tested all the pandas codebase!\n\nDon’t trust LLMs\n\nI guess at this point I don’t need to tell you this but here we go: you can’t trust LLMs outputs. They are useful tools, but as with any tool you need to be careful of how you use it. LLMs tend to halluciante, and this make them dangerous tools. So I wouldn’t trust an LLM to tell me if my implementation of a method is correct or not.\n\nLooking at the pace at which LLMs get better maybe this point will be obsolete in the following years -or even months. But for the time being I wouldn’t trust an LLM to decide if some code is safe enough to be deployed.\n\nReasons to use suite\n\nAt this point, you might be wondering, “Why the hell do we need this?” or perhaps shouting, “Stop putting LLMs everywhere!” at your screen. Fair enough. Let me walk you through why this tool is worth considering and why it could be a valuable addition to your testing toolbox.\n\nComprehensive Coverage\n\nTraditional unit testing focuses on specific inputs and outputs, often covering only a small part of your code. With suite, the game changes. Instead of just testing specific cases, it evaluates the semantic correctness of your functions by cross-referencing their implementation against the documentation. For example, imagine you’ve implemented a factorial function, something like this:\n\ndef factorial(n: int) -> int:\n \"\"\"Calculates the factorial of a non-negative integer n.\"\"\"\n if n == 0:\n return 1\n return n * factorial(n - 1)\n\n\nyou write the tests, and everything passes\n\nassert factorial(0) == 1 # Factorial of 0\nassert factorial(1) == 1 # Factorial of 1\nassert factorial(5) == 120 # Factorial of 5\n\n\nBut here’s the catch: you’re missing some edge cases. What about negative inputs? What happens if someone passes a non-integer? Or very large numbers?\n\nThe problem is that traditional tests can only cover a narrow slice of your function’s behavior. Writing unit tests is hard and boring, and when combined, these two elements often lead to disaster. Just because a high percentage of tests pass doesn’t mean your code is bug-free. With suite, you can sidestep the pain of writing every single test case manually. Instead, the LLM reviews your function’s behavior holistically, saving time and ensuring a broader set of scenarios are taken into account.\n\nTrivial integration with pytest\n\nOne of the reasons to use suite is its seamless integration with pytest. You can easily incorporate semantic testing into your existing test suite without disrupting your workflow\n\n# test_module.py\n\nfrom module import function\nfrom suite import suite\n\ntester = suite()\n\ndef test_function():\n assert tester(function)\n\n\nThat’s it - clean and simple. When you run pytest, your semantic tests will execute alongside traditional tests. For teams with established testing practices, this trivial integration makes it easy to experiment with semantic testing without committing to a major workflow change.\n\nCatch bugs early\n\nIn a typical testing workflow, you write some basic tests to check the core functionality. When a bug inevitably shows up—usually after deployment—you go back and add more tests to cover it. This process is reactive, time-consuming, and frankly, a bit tedious.\n\nWith semantic unit testing, the LLM handles this in just one iteration. It checks the function’s behavior against its documentation right from the start, catching discrepancies upfront without waiting for them to surface in production. This approach ensures that you catch issues early, saving time and preventing bugs from ever making it to production.\n\nImprove your testing suite\n\nEven if you use semantic unit testing, you’re likely still relying on traditional unit tests (after all, if it isn’t broken, don’t fix it). However, by incorporating semantic unit testing into your workflow, you can enhance your existing test suite. Semantic unit testing can point you out uncovered corner cases that you might want to add to your unit tests.\n\nYou can run it locally\n\nThanks to the llm package you can use local models to run your semantic tests. Using llm-ollama plugin you can use any model in ollama to run your tests without having to share your precious code with the evil AI companies.\n\nAsync is fast\n\nsuite allows you to run the tests asynchronously, and since the main bottleneck is IO (all the computations happen in a GPU in the cloud) it means that you can run your tests very fast. This is a huge advantage in comparison to standard tests, which need to be run sequentially.\n\nConclusions\n\nBuilding suite has been a fun ride. It’s one of those projects that sat on my todo list for months (okay, maybe years) until the right mix of free time, curiosity, and external inspiration finally pushed it forward. I’m happy to say it’s now a real package on PyPI - and yes, I checked off one more item from my 100 list: 26. ✗ Publish a Python package (in pip) → ✅ Done.\n\nIs semantic unit testing the future of testing? Probably not. At least not yet. LLMs are powerful, but they’re unpredictable and far from perfect. That said, they open up a fascinating new space in developer tooling - one where we offload some of the tedium to machines and focus more on building than babysitting our code.\n\nI’m not here to sell you the idea that you should throw away your trusty test suite and blindly trust an LLM. Please don’t do that. But I do think there’s value in exploring tools like suite to complement what you already have. Use it as a sidekick, not as a replacement.\n\nIf you give it a try, I’d love to hear your feedback. And if you find bugs - well, just don’t tell the LLM.\n\n\n\n\n \n \n Special thanks to @stevepeak for transfering me the suite project on PyPi. ↩\n \n \n\n",
"tags": ""
},
{
"id": "3",
"title": "Three symmetric math riddles",
"url": "/2025/03/19/symmetric-math-riddles.html",
"date": "March 19, 2025",
"excerpt": "I like problems that are easy to pose, and that seem difficult to solve at first glance, but that a slight change of perspective makes them simple and easy to...",
"content": "I like problems that are easy to pose, and that seem difficult to solve at first glance, but that a slight change of perspective makes them simple and easy to solve. In this post, I will expose my 3 favorite problems of this type.\n\nFor each riddle, I’ll first explain the problem, then solve it the hard way, and finally show the elegant and simple solution. These three problems are widely known, so I’m sorry if you already know them.\n\nTwo bikes and a fly\n\nThis is one of my favorite problems to ask at parties or social events. You can ask this question to anyone with basic math knowledge and they’ll understand the question and the easy solution. And -in my experience- the more math you know the more you enjoy the trick used in the simple solution.\n\n\n Problem: Two cyclists start 30km apart and ride towards each other. Both cyclists travel at 5km/h. A fly starts on one cyclist's handlebar and flies towards the other cyclist. As soon as the fly reaches the other cyclist it goes back to the first one. The fly goes back and forth between them at 10km/h until they meet. What total distance does the fly travel?\n\n\n\nHard solution\nThe hard solution here involves summing the infinite sum. We'll calculate the distance the fly travels in each \"leg\" of its journey.\n\n On the first trip:\n \n Cyclist and the fly will met after $10 t_1 = 30 - 5t_1 \\implies t_1 = 2$. So in the first trip the fly travels $d_1 = 20km$\n Cyclists have traveled $10km$ each one, so now the distance between them is $10km$\n \n \n On the second trip:\n \n Now, the fly takes $10 t_2 = 10 - 5t_2 \\implies t_2 = \\frac{10}{15}$ and travels $d_2 = \\frac{100}{15} km \\approx 6.67 km$\n Cyclist traveled $\\frac{50}{15}km$ and the distance between them is $10km - \\frac{100}{15} \\approx 3.33 km$\n \n \n On the third trip:\n \n Once again, $10 t_3 = \\frac{10}{3} - 5t_3 \\implies t_3 = \\frac{10}{45}$ so the fly travels $d_3 = \\frac{100}{45} km \\approx 2.22km$\n ...\n \n \n Here we notice the pattern that in each leg the distance that the fly moves is reduced by 3 (20, 6.67, 3.33, ...). Therefore, we have to solve for the infinite series $D = 20 + 20 / 3 + 20 / 3^2 + 20 / 3^3 + ...$ which can be solved by noticing that $D = 20 + D/3$ so $D = 30km$.\n\n\n\n\nEasy solution\n\nTo solve the problem you just neeed to know how long would it take for the cyclists to meet and the multiply this time by the speed of the fly.\n\n\n Time to meet = \\frac{30 km}{10km/h} = 3h\n Distance traveled by the fly = $10 km/h \\times 3h = 30km$\n\n\n\n\nThere’s a funny story about this problem involving John von Neumann. Someone once asked von Neumann this question, expecting it to be challenging. To their surprise, von Neumann solved it almost instantly. The person who asked the question was disappointed and said, “Oh, I guess you figured out the quick trick to solve it!”. Von Neumann, looking confused, replied, “What trick? I just added up the infinite series.”\n\nAnts on a ruler\n\nI don’t remember when I first heard about this problem, but it was a long time ago - maybe during my Physics degree. I remember it was the first time when I smiled in awe after a math proof 1.\n\n\n Problem: A 1-meter stick has 50 ants randomly placed on it, facing random directions. Ants move at 1 meter per minute. When they arrive at the end of the ruler they fall off at the ends. If two ants meet they reverse direction and keep moving. How long do you have to wait to be sure all the ants have fallen off?\n\n\n\nHard solution\nIn this case I wasn't able to get an analytics solution, so I decided to solve it with code. Here we'll simulate the ants for a couple of experiments and analyze the results.\n\n\nimport random\n\ndef initialize_ants(n_ants=10, stick_length=1.0,):\n ants = [(random.uniform(0, stick_length), random.choice([-1, 1])) \n for _ in range(n_ants)]\n return ants\n\ndef simulate_ants(ants, stick_length, ant_speed=1.0, dt=0.0001):\n # Initialize ants as (position, direction) tuples\n\n time = 0\n while ants: # While there are ants still on the stick\n # Move all ants\n ants = [(pos + dir * ant_speed * dt, dir) for pos, dir in ants]\n \n # Handle collisions\n for i in range(len(ants)-1):\n for j in range(i+1, len(ants)):\n if abs(ants[i][0] - ants[j][0]) < 1e-7:\n # Swap directions\n ants[i] = (ants[i][0], -ants[i][1])\n ants[j] = (ants[j][0], -ants[j][1])\n \n # Remove ants that fell off\n ants = [(pos, dir) for pos, dir in ants if 0 < pos < stick_length]\n \n time += dt\n\n if time > 1.:\n print(ants)\n break\n \n return time\n\n\nIf you run the below code for some iterations you can plot an histogram like the following one.\n\n\n\n \n Time to fall distribution\n\n\n\nThere you can see that the maximum amount of time the ants spend on the rule is 1 minute.\n\n\n\n\nEasy solution\n\nHere's the key insight: it doesn't matter if the ants bounce off each other when they collide. Since all ants look the same, we can pretend they just pass right through each other without changing direction. The only thing we care about is when the final ant drops off the ruler. And since each ant moves at 1 meter per second along a 1-meter ruler, we know that after exactly 1 minute, every ant must have reached one end or the other and fallen off.\n\n\n\nBoarding a plane\n\n\n Problem: A plane with 100 seats is boarding. The first passenger boards and sits randomly in one of the 100 seats. Each subsequent passenger boards and takes their assigned seat, but if their seat is already occupied by someone who sat randomly, they occupy a random empty seat instead. Which is the probability that the last passenger gets their assigned seat?\n\n\n\nHard solution\nLet's solve this rigorously by calculating the probabilities. Let's denote by f(n) the probability that the last passenger gets their assigned seat in a plane with n seats.\n\n1. Initial Probabilities\nWhen passenger 1 boards, they can:\n\n Sit in their own seat (seat 1) with probability $1/n$\n Sit in the last seat (seat n) with probability $1/n$\n Sit in any other seat $i$ ($2 \\leq i \\leq n-1$) with probability $(n-2)/n$\n\n\n2. Case Analysis\n\n Case A: Passenger 1 sits in seat 1\n \n Everyone else will get their assigned seat\n Contribution to $f(n)$ is $(1/n) \\times 1 = 1/n$\n \n\n\n\n Case B: Passenger 1 sits in seat n\n \n The last passenger can't sit in their seat\n Contribution to $f(n)$ is $(1/n) \\times 0 = 0$\n \n\n\n\n Case C: Passenger 1 sits in seat $i$ ($2 \\leq i \\leq n-1$)\n \n When passenger $i$ arrives, they'll choose randomly among remaining seats\n This creates the same scenario as with $n-1$ seats\n Contribution to $f(n)$ is $((n-2)/n) \\times f(n-1)$\n \n\n\n3. Mathematical Formulation\nPutting it all together:\n$$\n f(n) = \\frac{1}{n} + \\frac{n-2}{n} f(n-1)\n$$\n\n4. Solving the Recurrence\n\n Base case: For $n = 2$, $f(2) = 1/2$ (trivial to verify)\n Assume $f(n-1) = 1/2$ for some $n \\geq 3$\n Then: $f(n) = 1/n + (n-2)/n \\times 1/2$\n Simplifying: $f(n) = 1/n + (n-2)/(2n) = (2 + n-2)/(2n) = n/(2n) = 1/2$\n\n\n5. Conclusion\nBy induction, $f(n) = 1/2$ for all $n \\geq 2$\n\n\n\nEasy solution\n\nThe easy solution consists in making a slight change of perspective: when a new passenger arrives and finds their seat occupied, the passenger asks the occupier to move and choose another seat at random. Before we were following what happens to each passenger, but now we can focus only on the first passenger, who is the only one choosing seats randomly. This way we can see that the first passenger will keep being moved around until only two seats remain: seat 1 and seat 100. At this point, the first passenger will choose randomly between these two seats, giving a 50% probability that the last passenger gets their assigned seat.\n\n\n\n\n\n \n \n The second time was during Cantor’s diagonal proof. ↩\n \n \n\n",
"tags": ""
},
{
"id": "4",
"title": "Optimizing Jupyter Notebooks for LLMs",
"url": "/2025/01/15/ipynb-for-llm.html",
"date": "January 15, 2025",
"excerpt": "I’ve been using LLM-assisted coding for the last couple of months, and it has been a game-changer. After a couple of iterations, my setup consists in\n\n",
"content": "I’ve been using LLM-assisted coding for the last couple of months, and it has been a game-changer. After a couple of iterations, my setup consists in\n\n\n ContinueDev + OpenRouter. I’m using OpenRouter because I can access all the models I need from a single provider and control my budget from a single entry point.\n Use Sonnet 3.6 for “easy” questions or edits, and o1-preview for big refactors.\n\n\nThis setup was working great for me, I only had to add a couple of bucks to OpenRouter every other month. But suddenly my expenses rocketed. The budget that used to last for around two months now was burned in less than a month. So I decided to investigate what was happening. Fortunately, OpenRouter allows you to see how much you spent on each call. The first thing I noticed is that o1-preview was much more expensive than I expected, so I stopped using it so frequently. This solved the problem partially, but even with just using Claude 3.6 the bill was still high. Then I noticed that for certain calls the number of tokens was huge.\n\n\n\n \n Almost $1.5 for just three calls\n\n\nTo put it into perspective, the first Harry Potter book has ~100,000 tokens. By no means I was passing that much code to the LLM. What was happening then?\n\nThen it dawned on me. Some days before I used the editor while working with some notebooks. Typically, I avoid this because notebooks are my space for “creative” coding, and I prefer to think through problems without AI suggestions. But while preparing some demo notebooks with elaborate plots, I included the full notebook content in the AI’s context to get assistance on improving sections.\n\nThe root cause was the structure of Jupyter Notebook (.ipynb) files themselves. Unlike regular Python files, notebooks are stored as JSON files that contain much more than just your code and markdown text. Let me break down what makes them so verbose:\n\n\n Code and Outputs: Every cell stores both your input code and its output, including any error messages or execution counts\n Rich Metadata: Each cell contains metadata about its execution state, timing, and formatting\n Base64-encoded Images: When you generate plots or display images, they’re stored directly in the notebook as base64-encoded strings. For example, a simple matplotlib plot might look like this in your notebook’s JSON:\n\n\n\"outputs\": [\n {\n \"data\": {\n \"image/png\": \"iVBORw0KGgoAAAANSUhEUgA... \n [thousands of characters continue...]\"\n }\n }\n]\n\n\nThis means that a single plot in your notebook adds thousands of tokens to your LLM context. In my case, a notebook that had a few hundred lines of code contained over 250,000 characters, with base64-encoded images accounting for most of that bulk.\n\nThe solution was very easy, just 4 of lines of bash.\n\n# Convert all Jupyter notebooks in the directory to Python scripts\njupyter nbconvert --to script *.ipynb\n\n# Loop through each Python script\nfor file in *.py; do\n # Remove lines containing base64-encoded images\n sed -i '' '/data:image\\//d' \"$file\"\ndone\n\n\nFirst, we convert all the *.ipynb files in the folder to .py modules and then we remove the base64 encodings of the images. This was very effective. I reduced the number of characters in one of the notebooks from 255974 to 14746. I reduced costs by 94% with this simple trick. As a collateral benefit, the latencies were also reduced. Now, whenever I need to use an AI assistant with a notebook I run this script and add the .py file to the LLM context.\n\nFinally, my key takeaways from this short story are\n\n\n Be mindful of what you’re feeding into your LLM. Jupyter notebooks are particularly tricky because they contain a lot of hidden content.\n Keep an eye on your costs. Tools like OpenRouter make it easy to track your spending and identify issues early.\n Jupyter notebooks are the devil’s favorite file format (mandatory link to Joel’s talk).\n\n\nI hope this helps someone avoid the same surprise I had. And remember, sometimes the solution is just a couple of lines of bash.\n",
"tags": ""
}
];
Now, using the information in window.documents
we can build the index using Lunr. The next snippet show how to do it.
<script src="https://unpkg.com/lunr/lunr.js"></script>
<script>
// Build the Lunr index in the browser
window.searchIndex = lunr(function () {
this.ref('id');
this.field('title', { boost: 10 });
this.field('tags', { boost: 5 });
this.field('excerpt', { boost: 3 });
this.field('content');
window.documents.forEach(function (doc) {
this.add(doc);
}, this);
});
</script>
What’s happening here:
this.ref('id')
sets the unique identifier for each document. Lunr returns this ref on search so we can map back towindow.documents
.this.field(...)
declares which fields to index and their relative importance via boosts (title > tags > excerpt > content).window.documents.forEach(... this.add(doc) ...)
feeds each post into the index. Passingthis
preserves the Lunr builder as the function context.
The result is an in‑memory index built once on page load. We query it with window.searchIndex.search(query)
and then hydrate results from window.documents
.
This keeps the site fully static: Jekyll renders the list, the browser builds the index. Compared to Hugo, this felt simpler for me: Liquid makes it easy to shape the data inline without extra templates or generators, and I don’t need a separate JSON step unless I want one. It also fits my hosting: GitHub Pages serves static files only, so I can’t add a server-side search service even if I wanted to.
Searching and serving results
When the user types, I run a Lunr query and map the top results back to my documents
list. Then I render a small card with title, date, and a contextual snippet with the matched terms highlighted.
function performSearch(query) {
if (!query.trim()) return [];
const matches = window.searchIndex.search(query);
return matches.slice(0, 10).map(m => {
const doc = window.documents[parseInt(m.ref)];
return { ...doc, score: m.score };
});
}
Keyboard shortcut and modal
Small touches make it feel native: a quick modal, Cmd+K/Ctrl+K to open, arrows to navigate, Enter to jump.
// Open with Cmd+K / Ctrl+K
document.addEventListener('keydown', (e) => {
if ((e.metaKey || e.ctrlKey) && e.key.toLowerCase() === 'k') {
e.preventDefault();
openSearchModal();
}
});
function openSearchModal() {
const modal = document.getElementById('searchModal');
const input = document.getElementById('searchInput');
modal.style.display = 'block';
input.focus();
}
The UI also supports ESC to close, arrows to navigate, and Enter to open the selected result.
Why Lunr
I didn’t want a service. Lunr is a tiny dependency that gives me real information‑retrieval behavior without infrastructure.
- Ranking that works: BM25 beats my past ad‑hoc scoring.
- Zero deps at runtime: no backend, no external service.
- Small & portable: a single script and a tiny doc list.
Try it
- Tap Cmd+K / Ctrl+K, or click the search link above.
- Type anything you remember from a post: a tag, a phrase, a topic.
If something feels off (results, snippet, ranking), tell me—I’ll tweak the fields/boosts or snippets.
Next steps
I’d like to add semantic search on client side too. We can compute embeddings at build time using a small embedding model (MiniLM/SentenceTransformers). Then embed the query at runtime and rank posts by cosine similarity.
-
Yes, I admit it, I vibe coded it. ↩