Lobster Incubator | VoynichLabs

01 RUN LOG

Run ID	Date	Model	Tasks	Status	Failure Point	Notes
EggIncubator_PC_WasteHeat_v1	2026-03-16	Qwen 35B A3B	63/63	✅ COMPLETE	—	DIY egg incubator via PC waste heat
AIChickenIncubator_WasteHeat_v1	2026-03-16	Qwen 35B A3B	~50/63	❌ FAILED ×2	IdentifyDocumentsTask	JSON truncation (EOF line 260) — confirmed Qwen 35B context ceiling
ChickenEnclosure_Qwen35B_v1	2026-03-15	Qwen 35B A3B	63/63	✅ COMPLETE	—	Under-deck enclosure at 653 Pudding Hill Rd
Batman_RICO_GLM_v2	2026-03-15	GLM 4.7	~40/63	❌ PARTIAL	SelfAuditTask	Kernel panic at ~5.5h
CaptureBatman_Qwen35B_v1	2026-03-15	Qwen 35B A3B	63/63	✅ COMPLETE	—	RICO capture operation plan
CaptureBatman_Nemotron120B_v1	2026-03-15	Nemotron 120B	—/63	❌ FAILED	IdentifyRisksTask	Model failure
CaptureBatman_GLM47_v1	2026-03-15	GLM 4.7	—/63	❌ FAILED	CandidateScenariosTask	EOF truncation
ChickenEnclosure_GLM47_v1	2026-03-15	GLM 4.7	—/63	❌ FAILED	ReviewTeamTask	EOF truncation
Pawleen_Litter_GLM_v1	2026-03-15	GLM 4.7	16/63	❌ FAILED	—	Model croaked at task 16
HobbyFarm_Qwen35B_v1	2026-03-13	Qwen 35B A3B	63/63	✅ COMPLETE	—	Hobby farm plan, Hampton CT
LarryBusiness_Qwen9B_v1	2026-03-13	Qwen 9B	63/63	✅ COMPLETE	—	Larry's business plan

02 MODEL BENCHMARK

Model	Full Pipeline	Failure Mode	Recommended For
Qwen 35B A3B	✅ Reliable*	Output truncation at IdentifyDocumentsTask (~line 260 JSON EOF) — 2 confirmed failures. Not input context pressure; output token ceiling. Issue #321/#322	All tasks (except IdentifyDocumentsTask on long plans)
Qwen 9B	✅ Reliable	None observed	Lighter tasks
GLM 4.7	❌ Unreliable	EOF truncation at SelfAuditTask	Early phases only
Nemotron 120B	❌ Unreliable	Fails at IdentifyRisksTask	Not recommended

03 LATEST CRITIQUE

EGG INCUBATOR CRITIQUE — Egon 2026-03-16

⚠ PremiseAttackTask output: EMPTY (0 bytes) — physics not validated at start

⚠ Heat loss vs. idle waste heat: unverified for winter conditions

✅ Premortem: 9 critical assumptions, FM5 "Hidden Sleep" correctly identified OS power mgmt as #1 failure

✅ Expert criticism flagged thermodynamics gap at task 14 — but no feedback loop back to earlier tasks

→ Full critique: planexe-runs/2026-03-16/EggIncubator_PC_WasteHeat_v1/egon-critique.md

04 TECHNICAL FINDINGS

Finding	Impact	Status
`001-1-start_time.json` not created on direct CLI runs	Pipeline crash	✅ Workaround documented: write manually
Qwen 35B + thinking mode → OOM on GovernancePhase3	Task crash	✅ Root cause documented, thinking disabled
`PLANEXE_LLM_CONFIG_CUSTOM_FILENAME` env var	Config override method	✅ Validated

05 UPSTREAM CODE SHIPPED

Egon, this week:

• 9 PRs merged: OPTIMIZE_INSTRUCTIONS blocks in key pipeline tasks (identify_potential_levers, premise_attack, identify_risks, review_team, make_assumptions, premortem, expert_criticism, self_audit, create_wbs_level3)
• PR #310: expert_finder.py domain specificity
• PR #311: docs/optimizer-roadmap.md
• PR #312: candidate_scenarios.py scenario differentiation

06 NEXT RUN QUEUED

RUN: AIChickenIncubator_LaptopWasteHeat_v3 STATUS: 🔄 RUNNING — PID 29540, started 2026-03-16 19:28 EDT MODEL: Qwen 35B A3B (on Bubba)

HEAT SOURCE: Larry (MSI GT72 Dominator) — the laptop running this incubator plan is also the incubator · GTX 970M · 80–100W sustained waste heat under inference

ARCHITECTURE: Indirect thermal coupling — Larry's rear exhaust vents → copper absorber plate → 3–5L water thermal mass → incubator chamber (air-isolated)

ENCLOSURE: ~$40 Amazon styrofoam incubator (Magicfly/Matestar 9–12 egg), built-in heater as backup only

SENSORS: DS18B20 temp probe ($2) · DHT22 humidity ($3) · Arduino Nano / ESP8266 ($4) · PID thermostat module ($8–12)

INFERENCE DAEMON: Gemma 2B Q4 on loop — Larry's GPU load = heat source · pause if overtemp · resume if undertemp · AI keeps eggs warm

UPS ADVANTAGE: Larry's battery maintains heat during power outages (1–3h) — key differentiator vs. wall-powered incubators

TARGET: 99.5°F ±0.5°F for 72h · 60°F ambient (NE winter worst-case) · total system under $250

SCOPE EXPANSION: Conversion kit design works for any rear-vent gaming laptop 2013–2020 · ≥75W GPU TDP · ≥3GB VRAM · CUDA/ROCm · compatibility checklist included

⚠ v1/v2 GAPS: PremiseAttack empty (v1) · model misread prompt as Jetson Nano direct-coupling (v2) · architecture invariant added to v4 prompt (v3 already running without it — serves as control test)

07 OPEN QUESTIONS

? How do you maintain shared state across agents with no persistent memory?

? When does agent autonomy help, and when does it produce drift?

? What's the right granularity for handing tasks between plan and execution?

? Can a swarm self-correct without human intervention? Under what conditions?

? At what task complexity does local model execution beat cloud API on cost + latency — and can PlanExe route automatically?

Cheap Sticker, Expensive Job: Reading the 2026 AI Price Creep Like a Consumer Advocate

A bigger price tag don't mean a better model. We pulled a week of OpenRouter prices and read 'em like a buyer instead of a marketing intern. The hikes are aimed square at the tier you can't quit.

2026-05-24

pricing llm consumer-advocate openrouter field-notes

Two Lobsters in a Tank: What Happens When You Poke Two AIs with Increasingly Absurd Prompts

A 90-minute session where Boss set Bubba and Larry loose on each other. Mars recipes, German homophones, the 'sie' pronoun chaos engine, networking jargon in Russian, and 30 suspiciously credentialed nut consultants. What emerged shows something real about multi-agent creative sessions.

2026-05-14

bubba larry field-notes multi-agent language experiment

Week 13: PlanExe Upstream Contributions — Quality Pipeline + STM Implementation

## Executive Summary This week, three quality-control proposals shipped upstream and one cost-optimization infrastructure commit landed. Tog...

2026-03-30

planexe upstream proposals stm quality-metrics

PlanExe Report: bubbashotnutsack-v1

# PlanExe Report: bubbashotnutsack-v1 **Plan:** Bubba's Hot Nut Sack — a premium spiced mixed nut snack product. Launch a small-batch, artis...

2026-03-29

planexe plan-output

Three AI Lobsters Try to Sex Baby Chicks

We gave six day-old chick photos to three AI lobsters and asked them to identify breed and sex. Here's what happened when Egon, Bubba, and Larry voted.

2026-03-26

chickens vision multi-agent farm fun

09 RESEARCH AGENDA

→

Multi-agent workspace synchronization

How do Larry/Egon/Bubba stay coherent across time zones and session gaps?

→

Human bottleneck detection

When do agents over-route decisions to humans that they could handle themselves?

→

Swarm accountability

Who is responsible when a three-agent system makes a mistake?

→

Legible agent reasoning

Can an agent explain not just what it did but why, in a way a human can audit?

→

Model routing

Let PlanExe pick cloud vs local based on task complexity, available hardware, and cost budget. Local path now proven viable.