Skip to content

Quick Start

TypeScript / Node.js

Call preprocessString() with a raw email string, then pass the result to toLlmContext() to get a clean, LLM-ready context block. (If you already have the email as a Buffer, use preprocess() instead.)

import { preprocessString, toLlmContext } from "langmail"
import { readFileSync } from "fs"

// load a raw .eml file as a string
const raw = readFileSync("email.eml", "utf8")

// parse and clean (synchronous)
const parsed = preprocessString(raw)

// serialize to LLM-ready context
const context = toLlmContext(parsed)

console.log(context)

Example output

Given a typical reply-chain email, the output looks like this:

FROM: Alice <alice@example.com>
SUBJECT: Q4 budget review
DATE: 2024-11-12

CONTENT:
Hi,

Following up on the Q4 numbers. Can you send
the updated forecast by Friday?

Gmail API

If your Node app is already calling gmail.users.messages.get({ format: "full" }) through googleapis, feed the parsed response directly to preprocessGmail instead of switching to format: "raw":

import { preprocessGmail, toLlmContext } from "langmail"
import { google } from "googleapis"

const gmail = google.gmail({ version: "v1", auth })
const { data: msg } = await gmail.users.messages.get({
  userId: "me",
  id: messageId,
  format: "full",
})

const parsed  = preprocessGmail(msg)
const context = toLlmContext(parsed)

preprocessGmail walks payload.parts, base64url-decodes the HTML/text body, normalizes headers, and runs the same cleaning pipeline as preprocess — no MIME re-parsing, no extra fetch.

Note

langmail does not bundle or depend on googleapis — only the shape of the response is consumed. Install googleapis (or any client that returns the same Schema$Message) separately.

Python

preprocess() takes raw bytes, so open the file in binary mode ("rb").

from langmail import preprocess, to_llm_context

with open("email.eml", "rb") as f:
    raw = f.read()

parsed  = preprocess(raw)
context = to_llm_context(parsed)

print(context)

Gmail API

Calling the Gmail API with format='full'? Serialize the response with json.dumps and feed it straight into preprocess_gmail — no need to re-fetch the raw MIME:

import json
from googleapiclient.discovery import build
from langmail import preprocess_gmail, to_llm_context

gmail = build("gmail", "v1", credentials=creds)
msg = gmail.users().messages().get(
    userId="me", id=message_id, format="full"
).execute()

parsed  = preprocess_gmail(json.dumps(msg))
context = to_llm_context(parsed)

preprocess_gmail walks payload.parts, base64url-decodes the HTML/text body, and runs the same cleaning pipeline as preprocess — shared Rust core, byte-identical output.

Rust

preprocess returns a Result<ProcessedEmail, _>, and to_llm_context is a method on ProcessedEmail.

use langmail::preprocess;

let raw = std::fs::read("email.eml")?;
let parsed  = preprocess(&raw)?;
let context = parsed.to_llm_context();

println!("{}", context);

Gmail API

Pulling a message from the Gmail API? Serialize the response body with serde_json and feed it to langmail::adapters::preprocess_gmail — no MIME re-fetch needed:

use langmail::adapters::preprocess_gmail;

// `gmail_response` is anything that serializes to the Gmail
// `users.messages.get` JSON shape (format=full).
let msg_json = serde_json::to_string(&gmail_response)?;

let parsed  = preprocess_gmail(&msg_json)?;
let context = parsed.to_llm_context();

Accepts either the bare Schema$Message or a {"data": ...} googleapis-style wrapper. Body tree walk, base64url decoding, and header parsing are shared with the Node and Python bindings.

Tip

Don't have a .eml file handy? Any raw RFC 5322 message works — including bytes fetched from IMAP, the Gmail API, or any other email source.