Skip to content

Streams & Encoding

The data contract between fileservers, kernel, and processes. Two layers: raw bytes at the bottom, string convenience on top.

Wire Format

Decision: Uint8Array is the universal wire format. Every fileserver read() returns Uint8Array, every write() accepts Uint8Array. The kernel never interprets contents. This keeps the protocol honest — a fileserver for /dev/random and a fileserver for /home use the exact same types. Binary correctness for free, no special-casing.

// Fileserver protocol (bytes only)
interface Fileserver {
read(fd: unknown, offset: number, count: number): Promise<Uint8Array>
write(fd: unknown, offset: number, data: Uint8Array): Promise<number>
// ...
}
// Kernel fd operations (bytes only)
kernel.read(pid: number, fd: number, count: number): Promise<Uint8Array>
kernel.write(pid: number, fd: number, data: Uint8Array): Promise<number>

Encoding

Decision: UTF-8, not configurable. The string convenience layer assumes UTF-8. There’s no use case for other encodings in an LLM shell environment — LLMs speak UTF-8. If a bin needs to handle arbitrary encodings, it works with raw bytes and brings its own decoder. No encoding parameter on streams, no codepage tables.

Shared instances, created once:

const encoder = new TextEncoder() // always UTF-8
const decoder = new TextDecoder('utf-8') // always UTF-8

Two-Layer Architecture

┌─────────────────────────────────────────────┐
│ String Layer (ProcContext) │
│ proc.stdin → async iterable of lines │
│ proc.stdout.write(string) → auto-encodes │
│ proc.stderr.write(string) → auto-encodes │
├─────────────────────────────────────────────┤
│ Byte Layer (Kernel) │
│ kernel.read(pid, fd, count) → Uint8Array │
│ kernel.write(pid, fd, data) → Uint8Array │
└─────────────────────────────────────────────┘

Bins use the string layer by default. Bins that need raw bytes drop down to proc.fs.read() / proc.fs.write() which go straight to kernel byte ops.

String Convenience Layer

This is the API surface bins actually interact with. Built on top of kernel byte operations.

stdout / stderr (Writable)

interface Writable {
write(data: string | Uint8Array): Promise<number>
}
  • If data is a string → encode to UTF-8 via TextEncoder, then write bytes to kernel
  • If data is Uint8Array → write bytes directly
  • Returns byte count written

Decision: accept both string and Uint8Array on write. No reason to force bins to manually encode. The common case is proc.stdout.write("hello\n") and it should just work. Bins doing binary I/O pass Uint8Array and skip encoding. One method, no overload confusion.

stdin (Readable)

interface Readable {
[Symbol.asyncIterator](): AsyncIterator<string> // line-by-line
read(count?: number): Promise<Uint8Array> // raw bytes
}

Two modes of consumption:

Line mode (default, async iterable):

for await (const line of proc.stdin) {
// line is a string, no trailing \n
}

Byte mode (explicit):

const chunk = await proc.stdin.read(1024) // Uint8Array

Decision: line mode is the default, byte mode is opt-in. LLM bins overwhelmingly process text line-by-line. Making the async iterable yield lines matches what grep, sed, awk, etc. expect. Bins that need raw bytes (e.g., a binary file copier) call .read() directly.

Line Splitting

Decision: the Readable layer splits lines, not the pipe or the consumer. The Readable wrapping stdin handles buffering partial UTF-8 sequences and splitting on \n. This logic lives in exactly one place — the stream layer — rather than being reimplemented in every bin.

Implementation sketch:

class LineReader {
private fd: number
private kernel: Kernel
private pid: number
private remainder: string = ''
async *[Symbol.asyncIterator](): AsyncGenerator<string> {
while (true) {
const chunk = await this.kernel.read(this.pid, this.fd, 4096)
// EOF
if (chunk.length === 0) {
if (this.remainder.length > 0) {
yield this.remainder
this.remainder = ''
}
return
}
const text = this.remainder + decoder.decode(chunk)
const lines = text.split('\n')
// Last element is either '' (line ended with \n) or a partial line
this.remainder = lines.pop()!
for (const line of lines) {
yield line
}
}
}
async read(count?: number): Promise<Uint8Array> {
return this.kernel.read(this.pid, this.fd, count ?? 4096)
}
}

Details:

  • Read chunk size is 4096 bytes — large enough to be efficient, small enough to not buffer excessively
  • Partial lines are buffered in remainder until the next \n arrives or EOF
  • On EOF, any remaining text (no trailing \n) is yielded as a final line
  • Lines are yielded WITHOUT the trailing \n — bins don’t have to strip it
  • decoder.decode(chunk) handles multi-byte UTF-8 sequences that span chunk boundaries — TextDecoder in streaming mode buffers partial code points

TextDecoder Streaming

One subtlety: a UTF-8 character can be split across two read() calls. TextDecoder with { stream: true } handles this:

const decoder = new TextDecoder('utf-8')
// First read returns bytes [0xC3] (first byte of ü)
decoder.decode(new Uint8Array([0xC3]), { stream: true }) // returns ""
// Second read returns [0xBC, 0x41] (second byte of ü, then A)
decoder.decode(new Uint8Array([0xBC, 0x41]), { stream: true }) // returns "üA"

Decision: use streaming TextDecoder in LineReader. Each LineReader instance creates its own TextDecoder with streaming mode to handle chunk-boundary splits correctly. Stateless decoding would corrupt multi-byte characters.

What About proc.fs?

The proc.fs methods (proc.fs.read(), proc.fs.write()) are thin wrappers around kernel byte operations. They do NOT go through the string convenience layer:

proc.fs.read(fd, count) // → Uint8Array (raw bytes from kernel)
proc.fs.write(fd, data) // → data must be Uint8Array
proc.fs.open(path, mode) // → fd number
proc.fs.close(fd) // → void

Decision: proc.fs stays byte-only. The string layer is a property of stdin/stdout/stderr — the pre-opened streams that bins expect to be text. Arbitrary file operations via proc.fs should be explicit about encoding. A bin reading a file can use decoder.decode(await proc.fs.read(fd, n)) if it wants a string. This prevents accidental encoding of binary files.

Helper: readFile / writeFile

For convenience, proc context includes high-level helpers that handle open/read/close and encoding:

proc.fs.readFile(path: string): Promise<string> // open, read all, decode, close
proc.fs.writeFile(path: string, data: string): Promise<void> // open, encode, write, close

These are sugar — they compose from open, read/write, close, and TextEncoder/TextDecoder. They assume UTF-8 text. Bins needing binary file I/O use the raw methods.

Decision: include readFile/writeFile on proc.fs. Bins frequently need to slurp or dump a whole file. Without these, every bin reimplements the open/read-loop/close pattern. Two helpers eliminate boilerplate without bloating the API.

Summary of Boundaries

LayerFormatWho uses it
Fileserver protocolUint8ArrayFileserver implementations
Kernel fd opsUint8ArrayKernel internals
proc.fs.*Uint8Array (except readFile/writeFile)Bins doing explicit file I/O
proc.stdin iteratorstring (lines)Bins processing text input
proc.stdin.read()Uint8ArrayBins processing binary input
proc.stdout.write()string or Uint8ArrayBins producing output
proc.fs.readFile()stringBins reading whole text files

Platform Capabilities

Platform adapters (nodeStdio, xtermStdio) expose optional capabilities beyond the core Readable interface. These are NOT implemented by kernel-internal streams (pipes, fd-backed readers).

Readable extensions

interface Readable {
// ... core methods ...
/** Enter / exit raw mode. Platform adapters only. */
setRawMode?(raw: boolean): void
/** Non-blocking drain of buffered data. Platform adapters only. */
tryRead?(count?: number): Uint8Array
}

setRawMode?(raw: boolean): void — switches the adapter between line-buffered and raw (character-at-a-time) input. In raw mode, each keypress is delivered immediately without waiting for Enter. wasmExec calls this before running a program compiled with ttyMode: 'raw', then restores line mode on exit.

tryRead?(count?: number): Uint8Array — synchronous non-blocking read. Returns whatever bytes are currently buffered, or an empty Uint8Array if none. Used by wasmExec in line mode to pre-buffer stdin before callMain() (Emscripten’s stdin callback is synchronous and cannot await). Kernel-internal Readable implementations MUST NOT implement this — doing so would drain pipe content before the WASM program can process it.

Decision: optional methods, not a subtype. Capability detection via ?. keeps the Readable interface stable for bins. A bin that doesn’t care about raw mode never sees these methods. A WASM runner that needs them checks proc.stdin.setRawMode !== undefined before calling.

ProcContext extension

interface ProcContext {
// ...
getTermSize?(): TerminalSize
}

getTermSize?(): TerminalSize — returns the current terminal dimensions ({ rows, cols }). Present only when a controlling terminal exists (i.e., the process was spawned from a platform adapter that provided getTermSize). wasmExec wires this into the Asyncify TTY bridge so WASM programs receive correct TIOCGWINSZ responses.

TerminalSize type

Defined in src/kernel/types.ts:

interface TerminalSize {
rows: number
cols: number
}