v0.2.4 · for windows · just out

Talk to your PC.
It listens.

Press a hotkey, say what you want, and the words land where your cursor is. No cloud. No accounts. No telemetry. Sweet, fast, and stupid private.

5.4%
word error rate
~30×
faster than realtime · CPU
100%
on your machine
100% local no telemetry always free
v0.2.4 · 9.0 MB · Windows 10 / 11 (x64)
tellmebaby
Idle · tellmebaby
App at a glance

One small window. The pill does the rest.

Open the app once to pick languages and hotkeys. After that it lives in the tray; you'll never need the main window again unless you want to add a snippet or check yesterday's transcripts.

tellmebaby v0.2.4
tellmebaby Listening for ⌃⇧Space do Talk Read aloud Edit selection history Recent transcripts142 your words Dictionary14 Snippets7 Per-app modes settings Activation Speech & AI
Talk

Hold the hotkey, talk, release.

Words appear at your cursor. Five hotkeys cover the whole flow.

Dictate
Hold to talk, release when you're done.
CtrlShiftSpace
Read selection
Highlight first, then press. Hear it through your speakers.
CtrlShiftAltR
Edit selection
Select text, hold + speak ("translate to French"), release.
CtrlShiftAltE

That's the whole UI. The pill does the heavy lifting.

How it feels

Hold the hotkey. Talk. Watch the words show up.

Anything that takes text takes tellmebaby. Slack. Outlook. VS Code. Notion. Your terminal. Even sketchy little admin tools nobody else supports.

draft — slack #design
What it does

One key turns your voice into a tool.

Six small skills, one tiny pill. Designed to feel like a friend hanging around in the corner, not a dashboard taking over your life.

Dictate, anywhere

Hold your hotkey, talk, release. The words land at your cursor. Slack, VS Code, Outlook, your terminal — anything that takes typed text takes voice now.

Read it back

Highlight a paragraph, hit a hotkey, hear it. Same for the clipboard, or any chunk of text you paste in. Your computer talks back, in a real voice that doesn't sound robotic.

Edit by voice

Select text, press the edit hotkey, say "translate to French" or "make this shorter and less formal." The selection rewrites in place — no copy-paste shuffling.

Custom dictionary

Got a name the recognizer keeps butchering? Add it. Snippets map voice phrases to fixed text — addresses, signatures, prompts you reuse. The longer you use it, the smarter it gets at sounding like you.

Per-app modes

Clean up filler words in Slack but keep them verbatim in your terminal. Different apps get different treatment automatically — no settings to remember when you switch windows.

Maximum-quality models

Parakeet TDT for English. Whisper Large v3 Turbo for everything else. INT8 on CPU, GPU acceleration where there is one. No "lightweight model" excuses for bad accuracy.

The actual numbers

SOTA models, on your CPU, no asterisks.

tellmebaby ships two ASR models that pick themselves automatically based on the languages you speak. Both rank near the top of the public Hugging Face Open ASR leaderboard. Both run in INT8 on a normal laptop CPU at multiples of real-time.

Model WER ↓ Speed (RTF) ↑ Languages Size
Parakeet TDT 0.6B v3 ~6.3% ~30× 25 EU (incl. English) 660 MB
Cohere Transcribe 2B ~5.4% ~12× 14 (EU + zh/ja/ko/vi/ar) 1.7 GB
Whisper Large v3 Turbo ~7.4% ~4× 99 1.0 GB
Whisper Large v3 (full) ~7.7% ~1× 99 3.1 GB
WER on the Hugging Face Open ASR Leaderboard (avg across 8 English benchmarks, lower is better). RTF measured on 8-core x86 CPU, INT8 quantized weights, sherpa-onnx runtime. Whisper Large v3 (full) shown for context — we don't bundle it; users who need 99-language coverage stay on Turbo.
First five minutes

One question. Then it's already working.

Onboarding is intentionally tiny: tell us which languages you speak. We auto-pick the right ASR model, download it in the background, and you're dictating before the install bar finishes filling.

step 1 of 1

Which languages do you speak?

Pick all that apply. We'll grab a model that handles them well — no model picker to fuss with later.

English
Deutsch
Français
Español
Italiano
Português
日本語
한국어
中文
→ Cohere Transcribe · 1.7 GB · ~5.4% WER Get started

That's the whole onboarding.

Choreography

Five hotkeys. Zero menus.

Picked so they don't fight your browser, your editor, or Windows itself. Ctrl+Shift+Alt + a letter is essentially never claimed by anything else — so the chords stay yours.

Ctrl+Shift+Space Dictate. Tap once to start, tap again to stop — or switch to hold-to-talk in Settings. Words appear at the cursor.
Ctrl+Shift+Alt+R Read selection. Highlight first, then press. The computer reads it through your speakers.
Ctrl+Shift+Alt+V Read clipboard. Whatever you copied last gets read aloud, no app switch needed.
Ctrl+Shift+Alt+E Edit selection. Select text, hold and say what to change ("translate to French", "make this shorter"). Rewritten in place on release.
Ctrl+Shift+Alt+S Stop reading. Cuts off any in-progress read-aloud immediately. Useful for long paragraphs you only needed the first sentence of.

all five are remappable in Settings → Activation

Hold Esc while the pill is active to throw away whatever you just said.

It's actually different

Built around your hands, not your eyes.

The pill is the whole UI.

One small surface, bottom-center of your screen. It tells you what's happening — listening, transcribing, done — without stealing your attention. No window, no panel, no dashboard. Hover it for the hotkey hint or click to open the actual app.

0:04

It learns the words you actually say.

Names of coworkers, internal tool names, that one client whose name nobody can spell. Add them once to your dictionary; the recognizer biases toward them forever. No model retraining, no cloud roundtrip.

// dictionary
Aghil Aghil
Tellmebaby tellmebaby
Cloudflare Cloudflare
K8s Kubernetes

Your voice stays here.

The recognizer runs entirely on your CPU or GPU. No microphone audio is ever sent over the network. We say "100% local" and we mean it — pull your network cable mid-sentence and tellmebaby keeps working.

// network log
initial model fetch — once
update check — every 6h
microphone audio — never
transcript text — never
telemetry — never
vs. the alternatives

Why local at all?

Cloud dictation is fast and accurate. So is tellmebaby — without sending your voice anywhere. Here's the actual difference.

Cloud dictation

  • × Uploads your audio for every utterance
  • × Subscription, eventually
  • × Stops working without internet
  • × Privacy policy you'll never read
  • × Can disappear when funding runs out

tellmebaby

  • Audio never leaves your machine
  • Free, no account, no signup
  • Works on a plane, in a tunnel, anywhere
  • No data to leak because we don't have any
  • Open enough to fork — your install is forever
Why you can actually trust this

The receipts.

"100% local" is easy to claim. Here's the technical reality so you can verify rather than take our word for it.

The audio path is auditable

The recognizer is sherpa-onnx, an open-source ONNX Runtime wrapper that runs INT8 model weights on your CPU. No code we write touches a network socket during transcription. Open Wireshark; watch nothing happen.

Updates are signed

Every release is signed with an ed25519 key (separate from Authenticode) and the public half is embedded in the app. The auto-updater refuses to install anything that doesn't verify against it. You can't be tricked into installing a malicious "update."

Verifiable downloads

SHA256 of every release is published right next to the download button. certutil -hashfile setup.exe SHA256 on Windows or shasum -a 256 on Mac/Linux confirms what you got matches what we shipped — no MITM, no tampered binaries.

Apache 2.0 model licenses

Both default models — Parakeet TDT v3 (NVIDIA) and Cohere Transcribe (Cohere Labs) — ship under Apache 2.0. You can fork tellmebaby, swap the models, redistribute. No vendor can pull the rug.

Local-only data, in plain sight

Recordings live as .wav in ~/.tellmebaby/recordings/. Transcripts live in a SQLite DB you can open with any client. Settings are JSON. Nothing's compressed, nothing's hidden, nothing's "cloud-synced for your convenience." Delete the folder; reset complete.

Free without a moat

No paid tier planned. No telemetry to monetize. If maintenance ever shifts to paid, current installs keep working at the price they were installed at. The version you have now is the version you have forever.

Everything stays on your machine.

No cloud accounts. No telemetry. No analytics beacons. The only network traffic is downloading speech models the first time you pick a language — and even those are open-weight files you can delete and re-fetch any time. Your voice is yours.

No microphone uploads No transcript uploads No analytics No accounts Works offline
Built for

People who'd rather talk than type.

If you've ever finished an email and realized your hands are tired, tellmebaby is for you. If you haven't, it's for you anyway.

Writers — drafting at speech speed

You think faster than you type. Talk through your draft, then clean it up with the keyboard. tellmebaby gets the messy first version out so editing is the only work left.

Devs — for the parts that aren't code

Commit messages, doc strings, Slack replies, code reviews. The 40% of dev time that's English instead of code is where dictation pays off — your hands stay on the keyboard, your voice does the typing.

Multitaskers — for when your hands are busy

Eating lunch. Holding the baby. Pet on the lap. Whatever's happening, you can still get a paragraph out. tellmebaby doesn't care if your fingers are sticky.

RSI / accessibility — hands-light, by design

The hotkey is "any modifier you can mash with one finger." Custom dictionary, fast cancel, everything works one-handed. Edit Mode means you don't even need to retype to revise.

One download. no account, no card.

Run the installer, pick the languages you speak, and start dictating in under five minutes. Updates are automatic and signed.

Download v0.2.4 for Windows
tellmebaby_0.2.4_x64-setup.exe · 9.0 MB · Windows 10 / 11 (x64)
sha256 · 3247927baca97e2914a99fcf82ac759525e975424f6b7c2b65acee5d30b56fe3
Real questions

The stuff people actually ask.

Why does Windows say "Windows protected your PC" when I run it?
That's SmartScreen. It warns about apps it hasn't seen before — code-signing certs cost a few hundred dollars a year, and we haven't bought one yet. Click More infoRun anyway and the install proceeds normally. The download has a SHA256 published right next to the button you can verify against.
Is my voice actually private? For real?
Yes. Speech recognition runs entirely on your CPU or GPU using locally-stored model files. Pull your network cable mid-sentence and tellmebaby keeps working. The only network traffic is the initial speech-model download (so you can pick languages) plus a JSON poll every six hours to see if there's an update. That's the entire network footprint.
How accurate is it, really?
English: very good. Parakeet TDT 0.6B v2 is one of the best open-weight ASR models available right now, INT8-quantized but still benchmark-competitive with closed cloud APIs. Multilingual: Whisper Large v3 Turbo, also INT8. We pick the right model automatically based on the language you say you speak in onboarding.
Where does it store my recordings and transcripts?
%USERPROFILE%\.tellmebaby — recordings as .wav, transcripts in a SQLite database, settings as JSON. Nothing's compressed, nothing's encrypted, nothing's hidden. You can browse it all directly in File Explorer. Deleting that folder is a complete reset.
Does it auto-update?
Yes. The app polls a signed manifest a few hours after launch. When a new version is out, you get a banner across the top of the main window with a one-click Install + restart. Updates are signed with an ed25519 key — you can't be tricked into installing a fake one.
What if I want it to ignore my microphone in some apps?
That's what per-app modes are for. Set "passthrough" mode for, say, your password manager — the hotkey doesn't do anything when that app is focused. You can also pause the hotkey globally with one click in the sidebar.
Why such weird hotkey combos?
Because Windows reserves Win + letter at the shell level (so we never see Win+R, etc.), and browsers eat Ctrl+Shift+letter shortcuts (Ctrl+Shift+R reloads, Ctrl+Shift+V pastes plain text). Ctrl+Shift+Alt + letter is essentially never claimed by anything else, so the chords stay yours no matter which app is in front. Awkward to type — but they're usually held briefly on one hand. Re-bind to whatever you prefer in Settings → Activation.
macOS / Linux?
Not yet. The Windows build uses WASAPI for audio capture and Win32 keyboard hooks for the global hotkey — porting those to macOS/Linux is real engineering work, not a config flip. Windows-only for now; other platforms once the Windows experience is rock-solid.
What does it cost?
Free. Forever, on Windows. There's no paid tier planned, no "pro" features locked behind a paywall, no telemetry to monetize. If that ever changes, anyone who installed before the change keeps the version they had at the price they had.
Is there a hidden catch?
No. tellmebaby is a personal project that exists because the maintainer wanted a local-first dictation tool with a brain and couldn't find one. You're welcome to use it, share it, fork the source if you want to. There's no "actually it phones home for analytics" footnote.