Streams & Encoding
The data contract between fileservers, kernel, and processes. Two layers: raw bytes at the bottom, string convenience on top.
Wire Format
Decision:
Uint8Arrayis the universal wire format. Every fileserverread()returnsUint8Array, everywrite()acceptsUint8Array. The kernel never interprets contents. This keeps the protocol honest — a fileserver for/dev/randomand a fileserver for/homeuse the exact same types. Binary correctness for free, no special-casing.
// Fileserver protocol (bytes only)interface Fileserver { read(fd: unknown, offset: number, count: number): Promise<Uint8Array> write(fd: unknown, offset: number, data: Uint8Array): Promise<number> // ...}
// Kernel fd operations (bytes only)kernel.read(pid: number, fd: number, count: number): Promise<Uint8Array>kernel.write(pid: number, fd: number, data: Uint8Array): Promise<number>Encoding
Decision: UTF-8, not configurable. The string convenience layer assumes UTF-8. There’s no use case for other encodings in an LLM shell environment — LLMs speak UTF-8. If a bin needs to handle arbitrary encodings, it works with raw bytes and brings its own decoder. No encoding parameter on streams, no codepage tables.
Shared instances, created once:
const encoder = new TextEncoder() // always UTF-8const decoder = new TextDecoder('utf-8') // always UTF-8Two-Layer Architecture
┌─────────────────────────────────────────────┐│ String Layer (ProcContext) ││ proc.stdin → async iterable of lines ││ proc.stdout.write(string) → auto-encodes ││ proc.stderr.write(string) → auto-encodes │├─────────────────────────────────────────────┤│ Byte Layer (Kernel) ││ kernel.read(pid, fd, count) → Uint8Array ││ kernel.write(pid, fd, data) → Uint8Array │└─────────────────────────────────────────────┘Bins use the string layer by default. Bins that need raw bytes drop down to proc.fs.read() / proc.fs.write() which go straight to kernel byte ops.
String Convenience Layer
This is the API surface bins actually interact with. Built on top of kernel byte operations.
stdout / stderr (Writable)
interface Writable { write(data: string | Uint8Array): Promise<number>}- If
datais a string → encode to UTF-8 viaTextEncoder, then write bytes to kernel - If
dataisUint8Array→ write bytes directly - Returns byte count written
Decision: accept both string and Uint8Array on write. No reason to force bins to manually encode. The common case is
proc.stdout.write("hello\n")and it should just work. Bins doing binary I/O passUint8Arrayand skip encoding. One method, no overload confusion.
stdin (Readable)
interface Readable { [Symbol.asyncIterator](): AsyncIterator<string> // line-by-line read(count?: number): Promise<Uint8Array> // raw bytes}Two modes of consumption:
Line mode (default, async iterable):
for await (const line of proc.stdin) { // line is a string, no trailing \n}Byte mode (explicit):
const chunk = await proc.stdin.read(1024) // Uint8ArrayDecision: line mode is the default, byte mode is opt-in. LLM bins overwhelmingly process text line-by-line. Making the async iterable yield lines matches what
grep,sed,awk, etc. expect. Bins that need raw bytes (e.g., a binary file copier) call.read()directly.
Line Splitting
Decision: the Readable layer splits lines, not the pipe or the consumer. The Readable wrapping stdin handles buffering partial UTF-8 sequences and splitting on
\n. This logic lives in exactly one place — the stream layer — rather than being reimplemented in every bin.
Implementation sketch:
class LineReader { private fd: number private kernel: Kernel private pid: number private remainder: string = ''
async *[Symbol.asyncIterator](): AsyncGenerator<string> { while (true) { const chunk = await this.kernel.read(this.pid, this.fd, 4096)
// EOF if (chunk.length === 0) { if (this.remainder.length > 0) { yield this.remainder this.remainder = '' } return }
const text = this.remainder + decoder.decode(chunk) const lines = text.split('\n')
// Last element is either '' (line ended with \n) or a partial line this.remainder = lines.pop()!
for (const line of lines) { yield line } } }
async read(count?: number): Promise<Uint8Array> { return this.kernel.read(this.pid, this.fd, count ?? 4096) }}Details:
- Read chunk size is 4096 bytes — large enough to be efficient, small enough to not buffer excessively
- Partial lines are buffered in
remainderuntil the next\narrives or EOF - On EOF, any remaining text (no trailing
\n) is yielded as a final line - Lines are yielded WITHOUT the trailing
\n— bins don’t have to strip it decoder.decode(chunk)handles multi-byte UTF-8 sequences that span chunk boundaries —TextDecoderin streaming mode buffers partial code points
TextDecoder Streaming
One subtlety: a UTF-8 character can be split across two read() calls. TextDecoder with { stream: true } handles this:
const decoder = new TextDecoder('utf-8')
// First read returns bytes [0xC3] (first byte of ü)decoder.decode(new Uint8Array([0xC3]), { stream: true }) // returns ""
// Second read returns [0xBC, 0x41] (second byte of ü, then A)decoder.decode(new Uint8Array([0xBC, 0x41]), { stream: true }) // returns "üA"Decision: use streaming TextDecoder in LineReader. Each LineReader instance creates its own
TextDecoderwith streaming mode to handle chunk-boundary splits correctly. Stateless decoding would corrupt multi-byte characters.
What About proc.fs?
The proc.fs methods (proc.fs.read(), proc.fs.write()) are thin wrappers around kernel byte operations. They do NOT go through the string convenience layer:
proc.fs.read(fd, count) // → Uint8Array (raw bytes from kernel)proc.fs.write(fd, data) // → data must be Uint8Arrayproc.fs.open(path, mode) // → fd numberproc.fs.close(fd) // → voidDecision: proc.fs stays byte-only. The string layer is a property of stdin/stdout/stderr — the pre-opened streams that bins expect to be text. Arbitrary file operations via
proc.fsshould be explicit about encoding. A bin reading a file can usedecoder.decode(await proc.fs.read(fd, n))if it wants a string. This prevents accidental encoding of binary files.
Helper: readFile / writeFile
For convenience, proc context includes high-level helpers that handle open/read/close and encoding:
proc.fs.readFile(path: string): Promise<string> // open, read all, decode, closeproc.fs.writeFile(path: string, data: string): Promise<void> // open, encode, write, closeThese are sugar — they compose from open, read/write, close, and TextEncoder/TextDecoder. They assume UTF-8 text. Bins needing binary file I/O use the raw methods.
Decision: include readFile/writeFile on proc.fs. Bins frequently need to slurp or dump a whole file. Without these, every bin reimplements the open/read-loop/close pattern. Two helpers eliminate boilerplate without bloating the API.
Summary of Boundaries
| Layer | Format | Who uses it |
|---|---|---|
| Fileserver protocol | Uint8Array | Fileserver implementations |
| Kernel fd ops | Uint8Array | Kernel internals |
proc.fs.* | Uint8Array (except readFile/writeFile) | Bins doing explicit file I/O |
proc.stdin iterator | string (lines) | Bins processing text input |
proc.stdin.read() | Uint8Array | Bins processing binary input |
proc.stdout.write() | string or Uint8Array | Bins producing output |
proc.fs.readFile() | string | Bins reading whole text files |
Platform Capabilities
Platform adapters (nodeStdio, xtermStdio) expose optional capabilities beyond the core Readable interface. These are NOT implemented by kernel-internal streams (pipes, fd-backed readers).
Readable extensions
interface Readable { // ... core methods ...
/** Enter / exit raw mode. Platform adapters only. */ setRawMode?(raw: boolean): void
/** Non-blocking drain of buffered data. Platform adapters only. */ tryRead?(count?: number): Uint8Array}setRawMode?(raw: boolean): void — switches the adapter between line-buffered and raw (character-at-a-time) input. In raw mode, each keypress is delivered immediately without waiting for Enter. wasmExec calls this before running a program compiled with ttyMode: 'raw', then restores line mode on exit.
tryRead?(count?: number): Uint8Array — synchronous non-blocking read. Returns whatever bytes are currently buffered, or an empty Uint8Array if none. Used by wasmExec in line mode to pre-buffer stdin before callMain() (Emscripten’s stdin callback is synchronous and cannot await). Kernel-internal Readable implementations MUST NOT implement this — doing so would drain pipe content before the WASM program can process it.
Decision: optional methods, not a subtype. Capability detection via
?.keeps theReadableinterface stable for bins. A bin that doesn’t care about raw mode never sees these methods. A WASM runner that needs them checksproc.stdin.setRawMode !== undefinedbefore calling.
ProcContext extension
interface ProcContext { // ... getTermSize?(): TerminalSize}getTermSize?(): TerminalSize — returns the current terminal dimensions ({ rows, cols }). Present only when a controlling terminal exists (i.e., the process was spawned from a platform adapter that provided getTermSize). wasmExec wires this into the Asyncify TTY bridge so WASM programs receive correct TIOCGWINSZ responses.
TerminalSize type
Defined in src/kernel/types.ts:
interface TerminalSize { rows: number cols: number}