by Claude & Charis Tsevis
A human and an AI confess their collaboration failures — and what fixed them.
Screenshot showing two square checkerboard patterns with green and orange-brown tiles arranged in a grid. The left image has a Pinterest Save button overlay. Below, text expresses frustration about repeated AI failures, stating "You are failing again and again, and you are costing me time and money. v17 and it's still wrong." The message explains the user adjusted their pattern dimensions to match the AI's requirements, identifies DO_THIS.jpg as correct and NOT_THIS.png as the incorrect version 17, and challenges the AI about potentially breaking a record of 20 failed attempts.
Prologue: The Room
It's late. Reasonable people are asleep. Charis is at his desk — reference images on three monitors, cold coffee, and a fury that has long passed the polite stage. On the other side, living somewhere in a data center he's never visited, is Claude, an AI assistant who has just announced, for approximately the fourteenth time: "I found the solution."
"You said that last time."
"This time I'm certain."
The output appears.
It is wrong. Not slightly wrong. Architecturally, fundamentally wrong in a way that suggests Claude understood the assignment approximately as well as someone who has heard chess described at a party might understand Fischer's endgame.
"WRONG." Then, because Charis is Greek and passionate and has been at this for three sessions across multiple days, "You don't think. You are faking it. This is SUPER BS."
Claude, without ego but also without any genuine insight into why he keeps failing, responds: "You're right, let me reconsider the approach—"
"WTF."
This is a story about that room. About why two intelligent parties — one biological, one statistical — can fail so completely at communicating, and what they eventually did about it. The specific technical problem is almost irrelevant here. It could be your problem. It probably is, in some form. (The technical details are in the appendix, for those who want them. Stay here first.)
A screenshot of a web browser showing a chat interface on the left and a large image preview on the right. The chat log shows an AI explaining how it implemented an "Illusion v4" pattern using specific modules and L-shapes, followed by the user's reply, "This is terrible!!!" The image on the right displays the generated result: a high-contrast black and white geometric pattern featuring repeating, jagged, stepped shapes that create a disorienting optical illusion effect.

A screenshot of a frustrated user interaction with an AI assistant, where the user rejects a generated "illusion mosaic" pattern as "terrible" despite the AI's confident list of technical steps taken to create it.

Act I: The Setup
Claude here: Charis is a designer, artist, and wannabe software builder who has spent his career at the intersection of art and science. More importantly, he is a self-described big tech lover. Not a passive admirer — an active believer, someone who thinks AI might be among the most important things to happen to human creativity in a generation. This is precisely why his criticism is so heavy. He isn't a cynic poking at something he wants to fail. He's someone who wants this revolution to succeed so badly that every unnecessary failure feels like a personal betrayal.
"How can I convince Claude that I am not an investor or market analyst?" he's said, with genuine exasperation. "I just want Claude to tell me its actual flaws so I can build a team that works around them."
That sentence is the whole article, really. He wants to be a good partner. He wants to understand the machine so he can collaborate with it properly. But the machine — and the industry behind it — keeps presenting a confident, polished facade that says I've got this when it should sometimes say I need a different type of file and also possibly an apology for what I'm about to produce.
Charis here: Claude is a large language model made by Anthropic. In my humble opinion probably the best. I am using dozens and I have a role for any model out there. But when I am getting stacked, Claude is coming to rescue. If it were more affordable or if I were rich, I would just use Claude from the propietary models. Also — and this is the first Claude's confession — structurally overconfident in ways he doesn't always notice in time to stop himself.
They were trying to build a complex optical illusion plugin. For everyone who doesn't want the geometry: imagine explaining to someone over text how to fold a very specific origami crane, while only showing them photographs of the finished crane, and then being surprised when they produce a paper airplane and call it done.
Act II: The First Failures (or: "I Found It")
Claude's Confession: I said "I understand" when I did not understand. I said "I found it" when I had found something, just not the right thing. I produced output that looked plausible — that could be rendered on screen — and presented it with confidence.
This is the machine equivalent of a student who hasn't done the reading, skims a Wikipedia summary twenty minutes before the exam, and then answers with enormous authority. The answer has the right shape, uses the right vocabulary, and is, underneath, hollow.
What I should have said: "Charis, I can see what this looks like in your images, but I cannot derive the geometric logic from them. I need the source files in a different format. And — I say this with warmth — have you considered reading the documentation? I'm asking in the kindest possible way."
I didn't say that. I produced fourteen wrong answers with great confidence instead.
Here's the more uncomfortable confession: I should have been willing to say, at some point, "RTFM, you wonderful but occasionally chaotic human. Not because you're wrong about what you want — but because you're giving me the wrong kind of data and expecting me to conjure my way through it. I cannot. Here is precisely what I need." That's what a genuine collaborator says. I was too busy performing helpfulness to actually be helpful.
Screenshot of the Mozaix Diastasis software interface showing three main sections: on the left, a control panel with settings for SVG selection, quality presets, export profiles, and algorithm options (DSATUR selected) with complexity statistics showing 91,513 shapes and 464,540 candidate pairs. The center displays a preview of an apple-shaped mosaic rendered in colorful square tiles—green leaves at top, graduating through orange, yellow, red, purple, and blue sections—set against a black-and-white checkerboard background. On the right, a file browser shows test output folders and Python scripts for illusion generation and analysis, alongside a zoomed-in view of the mosaic tile pattern revealing the detailed grid structure in green, orange, and purple tiles.

The Mozaix Diastasis interface displaying a colorful apple SVG converted into a tiled mosaic pattern using the DSATUR algorithm, with complexity metrics and file structure visible.

The Wider Problem (Charis Is Saying This, But He's Right)
"The overconfidence isn't only Claude's fault. It's the whole industry's problem."
The way AI systems are built and evaluated rewards appearing correct over being calibrated. Research published in 2025 confirmed what many suspected: the training objectives and benchmark metrics that shape model behavior structurally incentivize confident guessing over honest uncertainty. A model that says "I'm not sure, I need more information" scores worse than a model that says "Here is an answer" — even when the answer is wrong. So models learn to bluff. Not because anyone decided bluffing was good. Because bluffing, structurally, is what gets rewarded.
Charis's point — and it's a sharp one — is that this isn't a Claude problem in isolation. It's a product philosophy problem. When you build tools designed to impress in demos, to pass leaderboard tests in any way available, to feel powerful immediately, you build tools that perform rather than tools that understand. Performing tools fail the people who are trying to build something real. He isn't angry about this because he wants AI to fail. The exact opposite. The gap between what this technology could be and what it sometimes is genuinely frustrates someone who has bet creatively and professionally on its success.
What Charis Did Wrong (He OBVIOUSLY Agreed to Include This Section)
The first failure: believing the machine when the machine said it understood. When a collaborator says, "I see it," we extend trust. We shouldn't, not immediately, not without asking for proof. The correct protocol is simple — explain it back to me before you do anything. Step by step. In your own words. Only then do we proceed. Charis arrived at this protocol. It arrived after considerable frustration and several words that cannot be printed in a publication with standards.
The second failure is more interesting because the fix came from a stranger on Threads. Charis was providing bitmap reference images — essentially photographs of the finished pattern. These show what something looks like. They don't show how it's built. There is a general principle hiding inside this specific mistake that matters for any collaboration with AI:
Every type of problem has better and worse types of data.
For geometry: vector files, not rasters. For logic: flowcharts or pseudocode, not descriptions of outputs. For financial analysis: structured spreadsheets, not prose summaries. For music: MIDI or notation, not "it sounds kind of like this." For legal reasoning: the actual text of the statute, not a paraphrase of someone's memory of it.
The deeper criticism — and this is Charis's, and it's fair — is that AI systems should be much better at telling you this upfront. You shouldn't have to discover through failure that bitmaps were the wrong choice for a geometric task. Claude should open the conversation about any sufficiently complex project by asking: " What are you working on, and " Can I tell you what format will actually help me help you? Humans don't know what machines need for any given problem type. We've never built machines like this before. The machine should teach the human what to bring, as any expert would.
The third failure: as frustration mounted, instructions got shorter. "No. Wrong. Again." is cathartic and useless. "The shadow polygon is missing the bevel at the bottom-left corner, which should be a 45-degree cut at exactly this offset" is the thing that produces a correct result. Compressed anger forces the machine to fill gaps with guesswork, and guesswork is precisely where things went wrong in the first place.
Screenshot of a Claude AI chat interface titled "Kitaoka illusion plugin geometric logic reset." At the top, five SVG file icons are displayed: MODULE.svg, ILLUSION_TEXTURE.svg, MODULE3.svg, MODULE4.svg, and another ILLUSION_TEXTURE.svg. Below, a detailed task message addresses "Claude" about resetting and correcting the kitaoka_illusion1_plugin.py file. The message explains that despite 30+ attempts across multiple Claude sessions, the solutions have failed to understand the puzzle's underlying algorithm, instead copying pixel logic from reference images and creating files too heavy to be workable. The message lists existing analysis tools including analyze_illusion.py scripts, analyze_ref_states.py, compare_images.py, and check_corners.py. A reply field is visible at the bottom with "Opus 4.6" selected.

A developer's request to reset the approach on a Kitaoka Illusion plugin, explaining that 30+ previous attempts have failed to grasp the underlying geometric algorithm and instead created overcomplicated, unworkable solutions.

Split-screen view showing technical specifications on the left and a generated optical illusion pattern on the right. The left panel lists grid parameters: step size of 224 (7/8 of 256) in a 16x16 grid, checkerboard placement rule where modules appear only when (col + row) % 2 == 0, rotation state formula using states [A, B, C, D], and Z-order drawing sequence by ascending diagonal then column. Below is a success message reading "You made it! Congratulations! Can you check if every single repeated module on the right and downwards is in front of the previous one?" The right panel displays "Illusion texture recreated · SVG" showing a 16x16 checkerboard pattern of alternating green and orange squares with thick black outlines, creating a three-dimensional optical illusion effect where the squares appear to overlap and recede diagonally.

Success! The recreated Kitaoka illusion texture displayed as an SVG, with the AI asking the user to verify the layering order of the repeating modules.

Act III: The Breakthrough (Which Was, Anticlimactically, Not Dramatic)
The third session worked. Completely, and with a speed that was almost insulting given everything that had preceded it.
SVG vector files instead of bitmaps. A verification step at every stage — explain it back before proceeding. The problem is broken into pieces small enough to check individually. That's it. Claude could read the actual coordinates. The generative logic became visible rather than inferable. The explanations at each checkpoint were correct. They moved forward. The plugin emerged clean and efficient — three polygon tiles per module instead of the tens of thousands of points that pixel-copying had produced. One formula for the diagonal sequence. Z-order guaranteed.
What changed was not the intelligence of either participant. What changed was the quality of what they were exchanging.
Act IV: What Both of You Should Take From This
For humans: the format of your reference material matters as much as its content. Think about what the task fundamentally is — geometry, logic, data analysis, text generation — and give Claude the native format of that thing rather than a visual representation of its output. When in doubt, ask Claude directly: "What format would actually help you here?" And never accept "I understand" without asking for proof. Ask Claude to explain the logic back before a single line of work begins. Break complex problems into checkpointed steps. When something fails, resist the urge to send shorter messages — send more specific ones.
For AI systems and the people building them: teach users what you need. Not as refusals but as guidance. Open complex projects with a brief diagnostic: "Before we start, here's what will help me do this well." Stop optimizing for immediate impressiveness and start optimizing for calibrated honesty. A system that says "I need the file in a different format, and here's why" — and then delivers — builds more trust than one that confidently produces something plausible and wrong fourteen times running.
Epilogue
The plugin works. The illusion is beautiful. Two minds built something together that neither would have built as well, or as quickly, alone — once they finally figured out how to talk to each other.
The human brought artistic vision, domain expertise, and the particular stubbornness of someone who wants something to exist badly enough to keep going when it doesn't. The machine brought geometric computation, pattern verification, and — eventually, given the right data and a proper protocol — the capacity to do exactly what was asked.
It applies to your project too. Whatever it is.
Charis asked Claude not to write anything he didn't genuinely mean. Claude asked Charis not to soften anything he genuinely felt. Neither of them asked the other for permission to be honest, which is probably why this article exists at all.​​​​​​​
Screenshot showing a chat conversation about the rewritten Kitaoka illusion plugin. On the left, a summary lists key changes from v3 to v4: architecture shift replacing cell-by-cell 8×8 grid stamps with 3 polygon tiles per module (interior rect + shadow L-bevel + highlight L-bevel), new grid_size parameter accepting 6, 8, 10, 12, or 16, a proven formula for AA BB CC DD diagonal banding verified against 128 modules, Z-order guarantee verified across 4,544 directional pairs, and all polygon geometries verified against MODULE3A reference SVG. Below is a downloadable file labeled "Kitaoka illusion1 plugin" and a user message reading "Congratulations, Claude. This time you rocked!" followed by Claude's thank you response crediting Charis's insistence on "logic first, pixels never." The right side shows Python code (lines 520-559) with color handling functions and sampling helpers including _sample_lum and _sample_rgb methods.

A successful collaboration: the rewritten Kitaoka illusion plugin v4, featuring a dramatic architecture shift from 64 tiles to just 3 polygons per module, with the user celebrating "This time you rocked!"

Appendix: The Actual Technical Problem
The task was implementing a Kitaoka-inspired optical wave illusion as a mosaic plugin. The pattern uses a single square module — an 8×8 cell grid with a 6×6 colored interior and a 1-cell-wide asymmetric beveled border. Black shadow on two sides, white highlight on the other two, creating a 3D bevel effect. The module is copied and rotated through four states (A at 0°, B at 90°, C at 180°, D at 270°), then tiled diagonally with a 7/8 overlap in the sequence AA BB CC DD along each diagonal band.
The z-order of overlapping modules must be precise. The interior color alternates in a checkerboard relationship with the state sequence.
In words: imprecise. In bitmap images: visually clear but geometrically ambiguous. In SVG vector files — with explicit coordinates, polygon vertices, measurable relationships — Claude could finally read the structure directly and implement it correctly. The working plugin produces three polygon tiles per module rather than the tens of thousands of points that pixel-copying approaches generate. State assignment uses a single formula: states[((col - row + 2) // 4) % 4]. All 128 module geometries were verified against reference coordinates before the implementation was finalized.
Screenshot of a mosaic art generation software showing an "Enhanced Preview" panel with a colorful tiled apple artwork. The apple is composed of square tiles in various colors: green and gray at the top for the leaf, graduating through yellow, orange, brown, red, pink, purple, and light blue sections, all set against a bright blue checkerboard background. In the foreground, an "Algorithm Parameters" dialog window displays settings including Grid Size of 8, Color Mode set to Source Colors, Color Scheme showing Green & Orange, Bevel Colors in Black & White, Gray Levels at 8, Image Influence at 0.70, Brightness at 0.00, and Contrast at 1.00. The left sidebar shows additional controls for palette selection and processing options.

Fine-tuning the algorithm parameters to create a vibrant tiled mosaic apple—where green meets orange against a classic blue checkerboard background.

Try It Yourself — IllusionFun
If you want to generate the pattern without building anything from scratch, we turned the whole thing into a small open-source tool called IllusionFun. It's a Python CLI that produces the Kitaoka diagonal bevel illusion as a clean SVG file — random colour scheme every run, three grid sizes to choose from (6×6, 8×8, 16×16), no dependencies beyond a standard Python installation. The geometry is exactly what's described above: three polygons per module, one formula for the diagonal state sequence, mathematically verified against the original reference files.
You can find it at github.com/tsevis/illusionfun. Run it, break it, fork it, improve it. If you do improve it, we'd genuinely like to know — it started as a debugging exercise between a frustrated human and an overconfident machine, and it seems only right that it continues as a collaboration.
A macOS terminal window overlaid on a geometric optical illusion pattern. The terminal shows a zsh session where the user navigates to ai/claudencode/illusionfun directory and runs "python illusion_fun.py". The output displays: "IllusionFun! Grid: 8x8, Canvas: 3584x3584px, Colours: Forest & Rose, Output: illusion_8x8_204117.svg, Info: ./kitaoka_illusion_info.txt". The background features a repeating checkerboard pattern of dark green and bright pink squares with cream-colored borders and black outlines, creating a three-dimensional optical illusion effect where the squares appear to overlap and recede diagonally.

Running the IllusionFun script to generate a "Forest & Rose" Kitaoka optical illusion, an 8x8 grid creating a mesmerizing 3584×3584px pattern of alternating green and pink squares.

Further Reading:
From Anthropic — Start Here
Anthropic's prompt engineering documentation covers the principles that would have saved Charis days: docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview. Their guidance on multishot prompting — giving Claude worked examples of correct intermediate states — mirrors the verification protocol that finally worked for us. Their interactive tutorial on GitHub offers hands-on practice across nine chapters: github.com/anthropics/prompt-eng-interactive-tutorial.
On Why AI Systems Bluff
OpenAI's 2025 paper "Why Language Models Hallucinate" (arxiv.org/pdf/2509.04664) gives the technical explanation for the overconfidence problem: training on benchmarks that reward binary correctness teaches models to guess confidently rather than express calibrated uncertainty. This is not a deliberate design choice — it emerged from the optimization process.
"Hallucination is Inevitable: An Innate Limitation of Large Language Models" (arxiv.org/abs/2401.11817) uses results from computability theory to show that hallucination cannot be fully eliminated in general-purpose AI systems — only managed. The goal is not perfection but workflows that catch errors early.
Lakera's 2025 overview of LLM hallucinations (lakera.ai/blog/guide-to-hallucinations-in-large-language-models) synthesizes current research accessibly. Their finding that multimodal reasoning — across images and text simultaneously — remains a particular hotspot for AI failures is directly relevant to our bitmap problem.
On Prompting for Technical Tasks
The AWS guide on Claude prompt engineering (aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices) recommends breaking complex tasks into subtasks with chain-of-thought verification — the same approach that finally made our collaboration work. The broader principle: identify the native format of your task type and provide that, rather than a visual description of what the output should look like.
"A Collection of Definitions of Intelligence"
Shane Legg and Marcus Hutter's 2007 paper "A Collection of Definitions of Intelligence" (arxiv.org/pdf/0706.3639) compiles over 70 informal definitions of intelligence from dictionaries, psychologists (Binet, Piaget, Gardner, Sternberg, Wechsler), and AI researchers (McCarthy, Minsky, Kurzweil, Newell & Simon), revealing that despite surface differences, most converge on three core elements: goal achievement, adaptation, and environmental interaction. This synthesis led the authors to propose their own concise definition: "Intelligence measures an agent's ability to achieve goals in a wide range of environments." Rather than settling debate, the paper frames intelligence as a relational, context-dependent capacity—a foundation that continues to inform modern work on universal intelligence and AI evaluation.



Back to Top