The two original sketches for Skiagrafia, side by side: the first attempt on the left, the second on the right. Between them, you can watch the thinking grow more honest. Certainties become questions, new models appear in the margins, and two words in an ellipse (OmniSVG / StarSVG) signal a path that would be explored and eventually abandoned.
An early digital elaboration of the first sketch — the hand-drawn arrows have become a typed flowchart, but the thinking is still exploratory. Pseudo-code has appeared alongside the boxes, model names are being tested against each other, and the question marks are still everywhere. The pencil sketch underneath is still visible through the diagram, a ghost of the earlier attempt.
Skiagrafia's single image mode mid-interrogation: the left panel shows an 800×698px image loaded and ready, with VTracer and bitmap output modes selected and parameters set. The terminal log on the right shows Moondream making a series of HTTP calls to the local Ollama server, querying for child objects of each detected label "computer monitor", "keyboard", "mouse" , before GroundingDINO loads and patches its ONNX-compatibility layer. Everything is running locally; nothing is leaving the machine.
The Models tab of Skiagrafia's Preferences window, showing the Ollama server URL, model selections (Moondream as primary VLM, MiniCPM-V as fallback, Qwen 3.5 as text reasoner), the shared model library directory, and the three installed models: GroundingDINO, SAM 2.1, and VitMatte with their on-disk sizes.
The Pipeline tab exposes every parameter that controls the quality-versus-speed trade-off: GroundingDINO's detection sensitivity, VTracer's spline-fitting precision, the bilateral filter's smoothing radius, and the interrogation strategy. All adjustable without touching the code.
GroundingDINO has located four objects in a photograph of a vintage Macintosh setup: a computer monitor, a computer tower, a keyboard, and a mouse, each outlined in a different colored bounding box. The canvas shows the Masks overlay at 145% zoom, split down the middle: the left half renders the original image, and the right half shows the blue segmentation mask for the monitor, already computed. The Layers panel on the right confirms all four detections are queued as parent silhouettes, ready for SAM to turn each box into a precise pixel mask.
The component architecture of Skiagrafia v5.0: Five layers, each knowing only the one below it. The UI calls build_capabilities(prefs) and receives a bundle. The Orchestrator calls five protocols and never names a model. The concrete clients implement those protocols. ModelManager resolves the paths. The grout between every layer is dependency injection; the tesserae can be swapped without disturbing the mosaic.
- `Detector`: accepts an image and a text prompt, returns a bounding box with a confidence score
- `Segmenter`: accepts an image and a bounding box, returns a binary mask
- `AlphaRefiner`: accepts an image and a mask, returns a soft alpha matte
- `Vectorizer`: accepts a binary mask, returns SVG path data as a string
Every image that enters Skiagrafia travels through ten steps in sequence, from a raw file to a structured SVG with named, layered groups. The color coding maps directly to function: teal for interrogation, purple for detection and segmentation, blue for alpha refinement, and coral for vectorization. The model weights noted in the left margin add up to roughly 5GB of local inference, none of it touching the network.
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 1080">
<g id="player">
<path d="..." fill="#333333"/>
<g id="player_jersey">
<path d="..." fill="#1a1a1a"/>
</g>
<g id="player_logo">
<path d="..." fill="#666666"/>
</g>
</g>
</svg>
The Triage step: The mandatory human gate before the GPU pipeline fires. Moondream has scanned the batch and surfaced ten unique labels; the designer now decides which ones are worth the compute. This ten-minute review is what prevents two hours of processing garbage. The amber step indicator and the warning banner at the top make clear: nothing moves forward until a human has looked at this.
UI Layer
main_window.py · batch_runner.py · step_progress.py
Reads preferences → builds concrete clients → injects via CapabilitySet
│
│ passes CapabilitySet
▼
Orchestrator (core/orchestrator.py)
Knows ONLY the Protocol interfaces
Never imports a concrete model client
│
│ calls Protocol methods
▼
Capability Protocols (core/contracts.py)
Interrogator · Detector · Segmenter · AlphaRefiner · Vectorizer
│
│ implemented by
▼
Concrete Model Clients (models/)
moondream_client.py · grounded_sam.py · vitmatte_refiner.py
+ VTracerVectorizer (processors/vectorizer.py)
│
│ paths resolved by
▼
ModelManager (utils/model_manager.py)
User-configurable models_dir from preferences
Registry of known models with download URLs
Device residency tracking and memory-aware unload
Step 2 of the batch wizard: Configure Pipeline, where the output contract for the entire batch is set before a single model loads. Both Vector (SVG) and Bitmap (TIFF) outputs are selected, the recursion depth is set to 2, and the full interrogation fallback chain is configured: MiniCPM-V as fallback VLM, Qwen 3.5 as text reasoner, and tiled fallback enabled for difficult images. Decisions made here govern every one of the 2,000 images that follow.
2. Configure: Set output formats (SVG, TIFF, PNG, PDF), VTracer parameters, and naming conventions.
3. Interrogate: Moondream scans all images in parallel, producing a tag cloud: a JSON object mapping each image filename to a list of detected labels. 2,000 images are typically completed in under an hour.
4. Triage: The mandatory human gate. The user reviews the complete list of labels: accepts useful labels, rejects incorrect ones, and edits labels that are almost correct. The pipeline cannot proceed until this step is completed.
5. Progress: The full 10-step pipeline runs on all triaged images. `batch_runner.py` coordinates parallel processing via `ProcessPoolExecutor`. State is persisted to SQLite via `sqlitedict`, so an interrupted batch can be resumed without reprocessing completed images.
6. Output: Summary statistics: images succeeded, failed, layers produced. Failed images can be retried individually.
The skiagrafia_out folder after a batch run on a collection of musical instrument photographs. Each source image has produced multiple output files, TIFF bitmaps with alpha channels and SVG silhouettes in solid blue, one per detected layer. Trumpets, harps, acoustic guitars, flutes, concert flutes, congas: every instrument isolated, every silhouette clean, every file named with the image source and the label that produced it. This is what 2,000 images look like when the pipeline has finished laying its tesserae.
Mozaix CGM Creator, the sibling application, shows a before/after split of a portrait being rebuilt from musical instrument silhouettes. On the left, the source photograph: a woman's face against a bokeh background of colored lights. On the right, the same face reconstructed as a mosaic of tiny trumpets, guitars, flutes, and other instruments, each one a tesserae cut by Skiagrafia and now placed by Mozaix. The two apps are one system: Skiagrafia cuts the pieces, Mozaix lays the mosaic.
The SVG output of a Skiagrafia batch opened in Adobe Illustrator at 400% zoom, showing hundreds of instrument silhouettes as fully editable vector paths. Each shape has a blue selection outline, confirming it is an independent, scalable object. This is what logotype-quality tracing looks like at the path level: clean Bézier curves, no pixel jagging, ready for print at any size.
The closing section of the Skiagrafia README on GitHub, practical troubleshooting for the three most common setup problems on Apple Silicon, followed by the license and a closing line that says what the project is actually for.
Moondream (vision-language model): moondream.ai. Also available via Hugging Face
GroundingDINO (text-conditioned detection): github.com/IDEA-Research/GroundingDINO
Segment Anything Model (SAM 2.1): github.com/facebookresearch/sam2
VitMatte (alpha matting): available on Hugging Face as `hustvl/vitmatte-base-composition-1k.`
VTracer (bitmap-to-vector): `pip install vtracer`: github.com/visioncortex/vtracer
Skiagrafia (the full system): github.com/tsevis/skiagrafia