Shachar Mirkin

Alpine Ibex Vision Test

2026-04-30T07:00:00+00:00

Here’s an image of a herd of ibex I took in August 2024 in the Vercors mountain range in France. There are over 20 of them, but yes, they’re really hard to see. I even had a hard time spotting them just a few hours after I saw them, so I was curious whether AI could do it. I asked some models whether they could spot any animals in the photo

Back then, nearly all state-of-the-art models failed miserably, typically detecting only a single ibex, but I did get some funny responses.

In April 2026, I tried it again, and this time I got some much better results, with Gemini 3.1 Pro detecting up to 10 ibex.

Possibly the most interesting result came when I asked Gemini to annotate them. It said it couldn’t. But when I instead asked it to regenerate the image with the animals painted in blue, it produced the image below. The capability was there all along. It was just a matter of framing the task differently to trigger it.

Grenoble Data Science meetup

2026-04-19T07:00:00+00:00

I am part of the organizing team for the Grenoble Data Science meetup. We typically meet once a month in Grenoble for a technical talk (in English) on data science, machine learning, or AI, followed by a social and technical discussion over food and drinks.

If you’d like to attend a talk, join the community, or learn more, check out our site or join our LinkedIn group. If you’d like to present, contact me or one of the other organizers.

Universal NER v2 paper

2026-03-18T07:00:00+00:00

I’m happy to share that our paper, Universal NER v2, was accepted to LREC 2026.

In this paper we present Universal NER (UNER) v2, a substantial extension of the dataset introduced in 2024. UNER is a collaborative resource for multilingual named-entity annotation, designed to support cross-lingual NER research.

UNER v2 adds 11 datasets covering 10 typologically diverse languages, including several aligned evaluation benchmarks, while preserving consistent annotation guidelines and high inter-annotator agreement. We provide detailed dataset statistics and benchmark performance using both encoder-based models and LLMs.

We compared human annotation with LLM-based annotation under the same guidelines. Our results show that LLMs still lag behind human annotators,and analyze the typical mistakes they make. While performance could likely be improved through more elaborate instructions or via agentic workflows, LLMs are not yet dependable annotators. That said, they show promise not only for annotation, but also for identifying inconsistencies in human labels and weaknesses in the guidelines, which we plan to explore in future work.

Terra Blevins, Stephen Mayhew, Marek Suppa, Hila Gonen, Shachar Mirkin, Vasile Pais, Kaja Dobrovoljc, Voula Giouli, Jun Kevin, Enes Yılandiloğlu, Eugene Jang, Eungseo Kim, Jeongyeon Seo, Xenophon Gialis and Yuval Pinter. Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark. LREC 2026.

Last updated: April 19, 2026

When LLMs See What We Don’t

2026-03-13T08:00:00+00:00

This weekend we have municipal elections here 🇫🇷, so a couple of weeks ago I asked a model about the candidates in our town. I was aware of three candidates, but the model said that two of them had formed an alliance to better challenge the current mayor, and it even provided supporting links. It kind of made sense politically, but when I followed the links, I couldn’t find any of the supposed “proofs”.

I confronted the model about it, but nothing I said helped convince it. I had to get to the bottom of it, so I kept going back every day to see if it could find new information to support its claim. It just doubled down, sending me more evidence I couldn’t see, along with “I understand your frustration”. 🙄

I even made up a conversation I supposedly had with one of the candidates where they denied the alliance, but it said that’s just what politicians do.

Eventually I checked the HTML source of one of the candidates’ websites, and there it was: the name of the other candidate, with hidden tags (specifically, text inside and a mention at the end of a long </code> tag, making the other candidate’s name invisible).</p> <p>Just a good old SEO trick, no prompt injection intended.</p> <p>The LLM took that and built an entire story about an alliance that never existed, and was completely convinced it was right.</p> <p>Sometimes we need to remind ourselves that what LLMs see isn’t the same as what we do.</p> <hr /> <div style="color: #666;"> <br /> <ul> <li>We've all heard about prompt injection, where white text that includes LLM instructions and is invisible to humans is embedded in documents, but this is the first time I've encountered this kind of case.</li> <li>As far as I understand, the title-tag strategy is not ideal and may be penalized by Google Search due to signal incoherence.</li> </ul> </div> </article> <article> <h1>Flat-fabricated Inflatables</h1> <p>2026-01-25T09:00:00+00:00</p> <p>Today I’m just proudly sharing my son’s <a href="https://dl.acm.org/doi/10.1145/3745778.3766669">paper</a> presented at <a href="https://scf.acm.org/2025/">SCF2025</a> at MIT 😊</p> <p><a href="https://ofirmirkin.github.io/">Ofir Mirkin</a> and his co-authors introduce a new type of flat-fabricated inflatable structures that we use to approximate the shapes of target developable surfaces. In other words, they program flat sheets so that when they’re inflated, they naturally morph into a desired 3D shape, like the deckchair in the image above.</p> <p>This work was done in collaboration with Nathan Vani, Etienne Reyssat, José Bico, and Mélina Skouras.</p> </article> <article> <h1>vLLM on Google Colab</h1> <p>2025-04-26T10:00:00+00:00</p> <h2 id="motivation">Motivation</h2> <p><a href="https://github.com/vllm-project/vllm">vLLM</a> is a popular library for fast LLM serving.</p> <p>I needed to test my code with vLLM but didn’t have access to the actual server. I couldn’t run in locally with a GPU either (Apple Silicon is not yet supported), so I set up a vLLM server on Google Colab, and used <a href="https://www.litellm.ai/">LiteLLM</a> to access it from my computer</p> <p>This <a href="https://gist.github.com/shacharmirkin/60d3403909ea5a540f7e17f2c3f2581a">gist</a> shows how it can be done.</p> </article> <article> <h1>Invalid Jupyter notebooks on GitHub</h1> <p>2025-04-16T10:00:00+00:00</p> <h2 id="the-problem">The Problem</h2> <p>When trying to view Jupyter notebooks on GitHub, we sometimes see the above ‘Invalid Notebook’ message. This often happens with notebooks created in <em>Google Colab</em> as it structures notebook metadata differently than what GitHub’s renderer expects.</p> <p>In this <a href="https://gist.github.com/shacharmirkin/7f608c51f5d1c159d5c5791081eb5c6d">gist</a> I describe different solutions (workarounds) for this issue, including a description of automating a fix as a pre-commit hook.</p> </article> <article> <h1>Shoebot</h1> <p>2024-10-04T10:00:00+00:00</p> <p>We wanted to experiment with a low-budget mobile robot with machine-learning (ML) capabilities. We decided on a project in which the robot’s mission is to detect when someone enters the house wearing shoes, and respond accordingly, helping maintain a shoe-free space! The robot uses ML to distinguish between two categories: shoes (forbidden) and bare feet, socks, or slippers (allowed).</p> <p>On this <a href="https://github.com/ofirmirkin/Shoebot/">Github repo</a> we describe in detail the project’s key components and the challenges we encountered along the way. The full code and a sample dataset are also available in our repo.</p> <h3 id="contributors">Contributors</h3> <p><a href="https://github.com/ofirmirkin">Ofir Mirkin</a>, Shachar Mirkin</p> </article> <article> <h1>Vibe Coding a chess app</h1> <p>2024-09-25T10:00:00+00:00</p> <p>When I saw that Lichess openings dataset was shared on Hugging Face, I thought I’d use AI to create an app within 45 minutes (like everyone else says they do) to help my daughter practice openings.</p> <p>Since my front-end skills are quite limited, Cursor was crucial, but I ended up spending much more time than planned, frequently consulting various LLMs, fixing code myself and reverting changes quite often (lesson learned: when using such tools, commit all the time!)</p> <p>The result is a little <a href="https://streamlit.io/">Streamlit</a> app for practicing chess openings, and my first project where AI wrote more code than I did.</p> <p>You can try it <a href="https://huggingface.co/spaces/shachar/chess-openings">on Hugging Face</a> (completely free of course), or check out the code <a href="https://huggingface.co/spaces/shachar/chess-openings/tree/main">here</a>.</p> </article> </main></body></html>