Quick Start
TypeScript / Node.js
Call preprocessString() with a raw email string, then pass the result to toLlmContext() to get a clean, LLM-ready context block. (If you already have the email as a Buffer, use preprocess() instead.)
import { preprocessString, toLlmContext } from "langmail"
import { readFileSync } from "fs"
// load a raw .eml file as a string
const raw = readFileSync("email.eml", "utf8")
// parse and clean (synchronous)
const parsed = preprocessString(raw)
// serialize to LLM-ready context
const context = toLlmContext(parsed)
console.log(context)
Example output
Given a typical reply-chain email, the output looks like this:
FROM: Alice <alice@example.com>
SUBJECT: Q4 budget review
DATE: 2024-11-12
CONTENT:
Hi,
Following up on the Q4 numbers. Can you send
the updated forecast by Friday?
Gmail API
If your Node app is already calling gmail.users.messages.get({ format: "full" }) through googleapis, feed the parsed response directly to preprocessGmail instead of switching to format: "raw":
import { preprocessGmail, toLlmContext } from "langmail"
import { google } from "googleapis"
const gmail = google.gmail({ version: "v1", auth })
const { data: msg } = await gmail.users.messages.get({
userId: "me",
id: messageId,
format: "full",
})
const parsed = preprocessGmail(msg)
const context = toLlmContext(parsed)
preprocessGmail walks payload.parts, base64url-decodes the HTML/text body, normalizes headers, and runs the same cleaning pipeline as preprocess — no MIME re-parsing, no extra fetch.
Note
langmail does not bundle or depend on googleapis — only the shape of the response is consumed. Install googleapis (or any client that returns the same Schema$Message) separately.
Python
preprocess() takes raw bytes, so open the file in binary mode ("rb").
from langmail import preprocess, to_llm_context
with open("email.eml", "rb") as f:
raw = f.read()
parsed = preprocess(raw)
context = to_llm_context(parsed)
print(context)
Gmail API
Calling the Gmail API with format='full'? Serialize the response with
json.dumps and feed it straight into preprocess_gmail — no need to
re-fetch the raw MIME:
import json
from googleapiclient.discovery import build
from langmail import preprocess_gmail, to_llm_context
gmail = build("gmail", "v1", credentials=creds)
msg = gmail.users().messages().get(
userId="me", id=message_id, format="full"
).execute()
parsed = preprocess_gmail(json.dumps(msg))
context = to_llm_context(parsed)
preprocess_gmail walks payload.parts, base64url-decodes the HTML/text
body, and runs the same cleaning pipeline as preprocess — shared Rust
core, byte-identical output.
Rust
preprocess returns a Result<ProcessedEmail, _>, and to_llm_context is a method on ProcessedEmail.
use langmail::preprocess;
let raw = std::fs::read("email.eml")?;
let parsed = preprocess(&raw)?;
let context = parsed.to_llm_context();
println!("{}", context);
Gmail API
Pulling a message from the Gmail API? Serialize the response body with
serde_json and feed it to langmail::adapters::preprocess_gmail — no MIME
re-fetch needed:
use langmail::adapters::preprocess_gmail;
// `gmail_response` is anything that serializes to the Gmail
// `users.messages.get` JSON shape (format=full).
let msg_json = serde_json::to_string(&gmail_response)?;
let parsed = preprocess_gmail(&msg_json)?;
let context = parsed.to_llm_context();
Accepts either the bare Schema$Message or a {"data": ...} googleapis-style
wrapper. Body tree walk, base64url decoding, and header parsing are shared
with the Node and Python bindings.
Tip
Don't have a .eml file handy? Any raw RFC 5322 message works — including bytes fetched from IMAP, the Gmail API, or any other email source.