Multi-Probe Bayesian Spam Gating: Filtering Junk Before Spending Compute
In a B2B lead generation pipeline, every email that arrives costs compute. Scoring it for buyer intent, extracting entities, predicting reply probability, matching it against your ideal customer profile — each module is a DeBERTa forward pass. If 40% of inbound email is template spam, AI-generated slop, or mass-sent campaigns, you are burning 40% of your GPU budget on garbage.
The solution is a gating module: a spam classifier that sits at stage 2 of the pipeline and filters junk before anything else runs. But a binary spam/not-spam classifier is too blunt. You need to know why something is spam (template? AI-generated? role account?), how confident you are (is it ambiguous, or have you never seen this pattern before?), and which provider will block it (Gmail is stricter than Yahoo on link density).
This article documents a hierarchical Bayesian spam gating system with 4 aspect-specific attention probes, information-theoretic AI detection features, uncertainty decomposition, and a full Rust distillation path. The Python model trains on DeBERTa-v3-base. The Rust classifier runs at batch speed with 24 features and zero ML dependencies.
The Pipeline: What the Gate Protects
The spam module does not exist in isolation. It is stage 2 of a 15-module pipeline called SalesCue, where every module shares a single DeBERTa-v3-base encoder loaded as a thread-safe singleton. One encoder forward pass produces 768-dim token embeddings; then each module runs its own specialized head on those embeddings. The spam gate decides which emails proceed to the remaining 14 heads — and which get discarded before any of that compute is spent.
The 15 modules registered in the engine, with their ML techniques:
| Module | Technique | What It Does |
|---|---|---|
| spam | Hierarchical Bayesian Attention Gate | Filters junk (this article) |
| score | Causal Signal Attribution (32 signals) | Lead scoring: hot/warm/cold/disqualified |
| intent | Neural Hawkes Process | Buying stage prediction (unaware → purchasing) |
| reply | Constrained CRF | Reply classification (10 types, structural constraints) |
| triggers | Temporal Displacement Model | Event freshness (funding, hiring surge, product launch) |
| icp | Wasserstein ICP Matcher | Ideal customer profile distance (6 dimensions) |
| call | Conditional Neural Process | Conversation scoring, commitment extraction |
| subject | Contextual Bradley-Terry | Subject line ranking conditioned on prospect context |
| sentiment | MI-Minimized Disentanglement | Sentiment decoupled from intent (7 sentiments x 4 intents) |
| entities | Regex + Pointer NER + Re-typing | Hybrid entity extraction (email, phone, company, role) |
| objection | 3-Way Pre-classifier + Coaching Cards | Objection handling (genuine/stall/misunderstanding, 12 types) |
| emailgen | Qwen LoRA Generator | Personalized email generation (separate LLM, not shared encoder) |
| survival | Deep Survival Machine (Weibull mixture) | Time-to-conversion prediction, risk groups |
| anomaly | DAGMM-inspired Signal Anomaly Detection | Detects 9 anomaly types (hiring spike, funding event) |
| bandit | Contextual Thompson Sampling | Outreach optimization (125 arms: template x timing x style) |
| graph | GraphSAGE (2-layer GNN) | Company relationship scoring (8 edge types, 5 graph labels) |
Every text-based module calls process(encoded, text) where encoded is the shared encoder output. The encoder singleton (SharedEncoder in backbone.py) uses double-checked locking with an RLock to ensure the DeBERTa model loads exactly once, even under concurrent access:
class SharedEncoder:
@staticmethod
def load(model_name="microsoft/deberta-v3-base"):
if _encoder is not None:
return _encoder, _tokenizer
with _lock:
if _encoder is not None: # double-check after acquiring lock
return _encoder, _tokenizer
# Load model once, serve all 15 modules
Why This Exists: The Compute Economics
The business case is arithmetic. Each downstream module head costs 15-40ms of post-encoder computation per email. With 14 modules downstream of the spam gate, a single junk email wastes roughly 350ms of module-specific inference (14 modules x ~25ms average).
At scale:
| Metric | Value |
|---|---|
| Daily inbound volume | ~1,000 emails |
| Spam rate (observed) | ~40% |
| Spam emails per day | ~400 |
| Wasted compute per spam email | ~350ms (14 modules x 25ms) |
| Daily compute waste without gate | ~140 seconds |
| Spam gate cost (Rust, batch of 256) | less than 1ms |
| Spam gate cost (DeBERTa, per email) | ~20ms |
| ROI: cost-to-savings ratio | 1:350 (Rust path) |
The Bloom filter makes this even more asymmetric. Known spam domains (mailinator.com, guerrillamail.com, etc.) are rejected in a single O(1) hash check — before feature extraction, before the classifier, before the encoder even runs. The cheapest possible rejection.
But the cost argument alone doesn't justify a Bayesian hierarchical architecture. A simple keyword filter would save the same compute. The deeper motivation is false positive cost. In B2B lead generation, a false positive — blocking a legitimate lead's email — is catastrophically more expensive than a false negative. A missed spam email wastes 350ms of compute. A blocked lead costs a potential deal worth thousands. The Bayesian uncertainty decomposition exists specifically to handle this asymmetry: when the model is uncertain, it quarantines rather than blocks, and the uncertainty type (aleatoric vs epistemic) tells operators whether to add more training data or accept inherent ambiguity.
Production Integration: Rust in the Outreach Pipeline
In production, the Rust spam classifier is embedded in the outreach team's quality gate. When the LLM drafts a personalized email, check_quality() runs before the email is approved for sending:
fn check_quality(subject: &str, body: &str) -> QualityChecks {
let word_count = body.split_whitespace().count();
// Spam score heuristic
let mut spam = 0.0;
let lower_subj = subject.to_lowercase();
let spam_triggers = [
"free", "urgent", "act now", "limited time", "winner",
"click here", "buy now", "!!!",
];
for trigger in &spam_triggers {
if lower_subj.contains(trigger) {
spam += 0.15;
}
}
if subject.chars().filter(|c| c.is_uppercase()).count() > subject.len() / 2 {
spam += 0.2; // >50% caps in subject → spam signal
}
QualityChecks {
subject_length_ok: subject_len > 0 && subject_len <= 60,
body_word_count: word_count,
body_length_ok: (100..=250).contains(&word_count),
has_cta,
spam_score: f64::min(spam, 1.0),
}
}
The QualityChecks struct feeds the outreach approval gate. If the spam score exceeds the threshold, the draft is rejected before it ever reaches the send queue. This dual-path architecture — lightweight heuristics in the outreach hot path, full Bayesian classifier for inbound scoring — lets the system operate at two speed tiers: sub-millisecond for outbound quality checks, sub-second for inbound classification with full uncertainty quantification.
The spam scoring kernel is one of 10 feature-gated kernels in the metal crate, compiled via Cargo feature flags:
[features]
kernel = [
"kernel-btree", "kernel-scoring", "kernel-html",
"kernel-arena", "kernel-timer", "kernel-crc",
"kernel-ner", "kernel-ring", "kernel-eval",
"kernel-extract", "kernel-intent", "kernel-spam"
]
Each kernel can be independently compiled in or out. The kernel-spam flag depends on kernel-scoring, ensuring the base scoring infrastructure is always available when spam detection is enabled.
Training Data: From Neon to Rust Weights
The training pipeline is a four-stage process that converts raw email data in Neon PostgreSQL into production Rust classifier weights.
Stage 1: Data Export (export_spam_data.py). Fetches emails from three Neon tables — sent_emails (delivery status), received_emails (inbound with reply classification), and email_campaigns (campaign-level metrics). Each email is labeled via heuristic rules:
| Label | Heuristic |
|---|---|
role_account | From address starts with noreply@, info@, billing@, support@, etc. |
ai_generated | Flagged by the emailgen pipeline (known LLM-generated) |
template_spam | Low personalization score + template ID is set |
content_violation | Spam keyword density exceeds 0.7 |
low_effort | Word count below 30 and personalization below 0.2 |
domain_suspect | Sender domain matches disposable provider list |
clean | Default — everything else |
Stage 2: DeBERTa Training. The SpamHead (1,352 lines of PyTorch) trains on the labeled data using the uncertainty-weighted multi-task loss described earlier. The model produces 7-class soft labels for every email.
Stage 3: Distillation (distill_spam_classifier.py). Extracts 24-element feature vectors from every email using the same feature extraction logic as the Rust classifier (same keyword lists, same homoglyph codepoints, same circadian encoding). Fits 7 independent logistic regressions via SGD with L2 regularization, one per spam category. The feature parity between Python and Rust is exact — both implementations share the same constants:
# Python distillation (identical to Rust SPAM_KEYWORDS)
SPAM_KEYWORDS = [
"free", "act now", "limited time", "guaranteed", "no obligation",
"click here", "buy now", "discount", "winner", "congratulations",
# ... 33 keywords total
]
Stage 4: Weight Export. Serializes the 7 weight vectors (24 floats each) and 7 biases as JSON. The Rust SpamClassifier::from_json() deserializes directly into the production struct:
pub fn from_json(path: &std::path::Path) -> Self {
std::fs::read_to_string(path)
.ok()
.and_then(|s| serde_json::from_str(&s).ok())
.unwrap_or_default() // falls back to untrained (all zeros)
}
The fallback-to-default design means a missing or corrupt weights file doesn't crash the pipeline — it degrades to an untrained classifier that passes everything through, which is the safe default for a gating module (no false positives).
Architecture: Why Hierarchical, Why Bayesian
Spam signals are level-dependent. The word "FREE" in a token is a different signal than an urgency pattern across a sentence, which is different from a document-level profile of high link density with failed SPF authentication. Operating at a single granularity — as most spam classifiers do — collapses these distinctions.
The module operates at three levels simultaneously:
- Token-level: Learned attention probes identify which individual tokens contribute to spamminess, with Beta distribution priors that quantify per-token uncertainty
- Sentence-level: Token posteriors are aggregated within each sentence's token span, combined with 12 structural features (greeting detection, CTA presence, urgency words, personalization signals)
- Document-level: Attention-weighted sentence aggregation feeds a 7-category classifier with information-theoretic features (character entropy, compression ratio)
Why Beta Priors
The Bayesian framing is not decorative. The Beta distribution is the conjugate prior for Bernoulli observations, which makes it the natural choice for binary "spam or not" token-level indicators. Each token gets a Beta(, ) prior where:
- The mean is the expected spam contribution
- The precision quantifies confidence — high precision means the model is sure about this token's role
- The variance decreases as precision increases
This propagates through the hierarchy: a sentence full of high-precision spam tokens is a stronger signal than a sentence with low-precision ambiguous tokens. A document where all sentences have high-confidence spam scores should be blocked immediately; a document with mixed-confidence sentences should be quarantined for review.
The Beta distribution's two parameters also give us a natural uncertainty decomposition that we exploit later — aleatoric versus epistemic — which a point-estimate classifier cannot provide.
The Four Aspect Probes
A single attention probe conflates different types of spam signals. A keyword-spam email ("FREE GUARANTEED WINNER") and an AI-generated email (perfectly grammatical, zero personalization) activate different patterns, but a single probe must compress both into one attention distribution.
The solution is multi-probe attention — four learned query vectors, each specialized for a different spam aspect:
self.probes = nn.ParameterDict({
"content": nn.Parameter(torch.randn(1, h4)), # keyword-level spam signals
"structure": nn.Parameter(torch.randn(1, h4)), # formatting/template signals
"deception": nn.Parameter(torch.randn(1, h4)), # urgency/manipulation
"synthetic": nn.Parameter(torch.randn(1, h4)), # AI-generated content signals
})
Scaled Dot-Product Attention with Beta Posteriors
Each probe independently computes attention over the token sequence using scaled dot-product attention with the probe vector as the query:
where is the learned probe for aspect , is the key projection of the encoder hidden states, and . The attention weights are then used to compute per-token Beta posteriors:
for aspect in self.ASPECTS:
probe = self.probes[aspect].unsqueeze(0) # (1, 1, h/4)
attn = torch.bmm(probe, keys.transpose(1, 2)) / scale # (1, 1, seq)
attn_w = attn.softmax(dim=-1).squeeze(1) # (1, seq)
# Beta posterior parameters via Softplus (ensures α,β > 1)
a = self.prior_alpha[aspect](values).squeeze(-1) + 1.0 # (1, seq)
b = self.prior_beta[aspect](values).squeeze(-1) + 1.0 # (1, seq)
posterior = a / (a + b) # E[Beta] = α/(α+β)
aspect_signals[aspect] = (attn_w * posterior).sum(dim=-1) # scalar
The Softplus activation ensures , and the + 1.0 shift ensures , giving a unimodal Beta distribution (the mode exists and is well-defined). Without this shift, the model could learn degenerate U-shaped priors.
Aspect Gating
A learned gating mechanism blends the four aspect signals into a single token-level spam score:
stacked_signals = torch.stack(
[aspect_signals[a] for a in self.ASPECTS], dim=-1) # (1, 4)
gate_weights = self.aspect_gate(stacked_signals) # (1, 4) softmax
token_spam_signal = (stacked_signals * gate_weights).sum(dim=-1)
The blended attention weights for downstream aggregation are computed as:
where is the softmax gate weight for aspect . This means the sentence-level aggregation naturally focuses on whichever aspect dominates for a given email.
The output includes per-aspect attribution: "this email was flagged 62% because of deception signals (urgency words at positions 3, 7, 12) and 28% because of synthetic signals (low perplexity variance)." This interpretability matters for false positive investigation — when a legitimate email gets quarantined, you can see exactly which aspect triggered it.
Per-Sentence Token Spans
The original implementation used global attention pooling for all sentences — every sentence got the same neural signal, differentiated only by structural features. This makes the "hierarchy" decorative rather than functional.
The fix is straightforward: partition the encoder's token sequence into per-sentence spans using character-to-token offset mapping, then compute attention-weighted aggregation within each span:
# Map character offsets to approximate token indices
c_start, c_end = char_offsets[i]
t_start = max(1, int(c_start / text_len * (seq_len - 1)))
t_end = min(seq_len, int(c_end / text_len * (seq_len - 1)) + 1)
# Attention-weighted value within this sentence's span
span_attn = blended_attn[:, t_start:t_end]
span_attn = span_attn / (span_attn.sum(dim=-1, keepdim=True) + 1e-8)
span_values = values[:, t_start:t_end]
sent_agg = torch.bmm(span_attn.unsqueeze(1), span_values).squeeze(1)
Now a sentence containing "ACT NOW! LIMITED TIME!" produces a genuinely different neural representation than "I enjoyed our meeting last Tuesday about the Q3 roadmap" — because each sentence's aggregation is computed from its own token span, not from a global pool.
The 12 Sentence-Level Structural Features
Each sentence is augmented with 12 hand-crafted features that capture spam-indicative patterns not easily learned from token embeddings alone:
| # | Feature | Extraction |
|---|---|---|
| 0 | Word count | len(words) |
| 1 | Has greeting | Starts with "hey", "hi", "hello", "dear", "good morning" |
| 2 | Has CTA | Contains "call", "schedule", "book", "demo", "sign up" |
| 3 | Has urgency | Matches against 14 urgency word patterns |
| 4 | Has personalization | Regex: \b(your company|your team|you mentioned)\b |
| 5 | Has link | https?:// pattern match |
| 6 | Is question | Ends with "?" |
| 7 | Punctuation density | Punctuation characters / total characters |
| 8 | CAPS ratio | All-caps words / total words |
| 9 | Pronoun ratio | First-person pronouns ("I", "me", "my", "we", "our") / words |
| 10 | Specificity | Capitalized non-common words / total words |
| 11 | Formality | Formal words ("please", "kindly", "sincerely") / total words |
The structural features are projected to 32 dimensions and concatenated with the neural span aggregation (192-dim), giving a 224-dim sentence representation. A 2-layer MLP with GELU activation produces a per-sentence spam score .
Document-Level Attention
Sentence embeddings are stacked and aggregated via learned document attention:
sent_stack = torch.cat(sentence_embeds_list, dim=0).unsqueeze(0) # (1, n_sent, 64)
sent_attn = self.doc_attention(sent_stack).softmax(dim=1) # (1, n_sent, 1)
doc_embed = (sent_attn * sent_stack).sum(dim=1) # (1, 64)
This allows the model to focus on the most spam-indicative sentences. An email with one spammy call-to-action sentence surrounded by legitimate content will have its attention concentrated on that sentence, rather than being diluted by the benign context.
Information-Theoretic AI Detection
Detecting AI-generated content is an adversarial problem. Paraphrasing tools, style transfer, and instruction-tuned models make surface-level detection unreliable. The AdversarialStyleTransferDetector uses 32 structural features grouped into four categories, with several drawn from information theory and computational linguistics.
Group 1: Basic Stylistic (Features 1-8)
| # | Feature | Signal |
|---|---|---|
| 1 | Sentence length std | AI text has more uniform sentence lengths |
| 2 | Contraction density | AI underuses contractions ("do not" vs "don't") |
| 3 | Parenthetical/dash count | Human writers use more digressions |
| 4 | Exclamation density | Spam and AI have distinct patterns |
| 5 | First-person density | "I", "I'm", "my", "me" — humans self-reference more |
| 6 | Average word length | AI tends toward longer, more formal vocabulary |
| 7 | Sentence starter variety | AI often starts consecutive sentences identically |
| 8 | Normalized text length | Short vs long text behaves differently |
Group 2: Vocabulary Richness (Features 9-16)
These features measure the statistical properties of word frequency distributions:
Type-Token Ratio (TTR):
where is vocabulary size (unique words) and is total tokens. A crude measure — it scales with text length — but effective when combined with length normalization.
Hapax Ratio:
where is the count of hapax legomena (words appearing exactly once). High hapax ratio indicates diverse, non-repetitive vocabulary.
Yule's K (Vocabulary Richness):
Raw word frequency variance is a weak signal — it scales with text length. Yule's K is length-invariant:
where , is the count of words appearing exactly times, and is total tokens. AI text has characteristically lower K values than human text because LLMs produce more uniform word frequency distributions — they avoid both very common and very rare words, compressing the frequency spectrum toward the middle.
freq_of_freq = Counter(freq.values())
m2 = sum(i * i * vi for i, vi in freq_of_freq.items())
yules_k = 1e4 * (m2 - N) / max(N * N, 1)
Honore's R Statistic:
This measures vocabulary diversity independent of text length. A high R indicates rich, varied vocabulary. AI-generated text has a characteristic R profile that differs from human writing — not always lower, but distributed differently across text lengths. The denominator captures the proportion of non-hapax vocabulary: text with mostly unique words (high hapax) drives R toward infinity, while text with heavily reused words drives R down.
if V > 0 and hapax < V:
honore_r = 100.0 * math.log(max(N, 1)) / max(1.0 - hapax / V, 0.01)
else:
honore_r = 0.0
honore_r = min(honore_r / 1000.0, 1.0) # normalize to ~[0,1]
The remaining features in this group: clause nesting density (commas + semicolons per sentence), conjunction density, hedging density ("perhaps", "might", "possibly"), filler word density ("basically", "literally", "honestly"), and list pattern detection.
Group 3: Stylistic Fingerprints (Features 17-24)
| # | Feature | Why It Matters |
|---|---|---|
| 17 | Question density | Human sales emails ask more questions |
| 18 | 1st-vs-3rd person ratio | Self-referential vs expository balance |
| 19 | Passive voice ratio | was/were + *ed pattern; AI overuses passive |
| 20 | Information density | Content words / total words |
| 21 | Paragraph length variance | AI produces uniform paragraph blocks |
| 22 | Opening sentence length | Human emails often start with short greetings |
| 23 | Honore's R statistic | See above |
| 24 | Transition word density | "therefore", "furthermore", "consequently" — AI overuses transitions |
Group 4: Adversarial Fingerprints (Features 25-32)
Shannon Word Entropy:
The entropy of the word frequency distribution measures predictability:
normalized by where is vocabulary size. AI text tends toward lower entropy — more predictable, more uniform distributions — because language models optimize for the most probable next token. The normalization ensures the feature is comparable across texts of different vocabulary sizes.
word_entropy = 0.0
for count in freq.values():
p = count / N
if p > 0:
word_entropy -= p * math.log2(p)
word_entropy /= max(math.log2(max(V, 2)), 1.0) # normalize to [0, 1]
N-gram Repetition Score: Combined bigram + trigram repetition rate. Template spam and AI text both show higher n-gram repetition, but for different reasons: templates reuse exact phrases, while AI has subtler lexical loops.
bigrams = [f"{words[i]} {words[i+1]}" for i in range(len(words)-1)]
trigrams = [f"{words[i]} {words[i+1]} {words[i+2]}" for i in range(len(words)-2)]
bi_rep = 1.0 - (len(set(bigrams)) / max(len(bigrams), 1))
tri_rep = 1.0 - (len(set(trigrams)) / max(len(trigrams), 1))
repetition_score = (bi_rep + tri_rep) / 2.0
Formality Index: Ratio of formal words ("sincerely", "pursuant", "herein") to informal words ("hey", "cool", "gonna", "btw"). AI tends toward higher formality — LLMs default to a formal register unless explicitly prompted otherwise.
Self-Reference Pattern: Ratio of self-reference pronouns ("I", "we", "our") to other-reference ("you", "your"). Legitimate sales emails focus on the recipient; AI-generated bulk content often defaults to self-promotional language.
Template Markers: Binary detection of template placeholders — {{, [[, <NAME>, [Company], __FIELD__ patterns that escaped variable substitution.
Unicode Anomaly Score: Weighted combination of non-ASCII characters, Cyrillic homoglyphs (6 codepoints that look identical to Latin: а, е, о, р, с, х), and zero-width characters. Homoglyphs get 5x weight, zero-width chars get 10x, because these indicate active deception rather than legitimate internationalization.
Trajectory Smoothness
LLMs maintain topic coherence too consistently. Human writing has natural velocity changes — digressions, asides, topic shifts. The trajectory smoothness feature measures the mean cosine similarity between consecutive sentence embedding positions:
projected = self.trajectory_proj(tokens[0]) # (seq, 32)
sampled = projected[::stride] # (~20, 32) evenly spaced
normed = F.normalize(sampled, dim=-1)
consec_cos = (normed[:-1] * normed[1:]).sum(dim=-1)
smoothness = consec_cos.mean().item()
High smoothness (consistently high cosine similarity between consecutive positions) is a signal of AI generation. Human text has lower mean similarity and higher variance — the embedding trajectory is "jerkier."
Watermark Detection
The watermark head detects statistical patterns from LLM watermarking schemes (Kirchenbauer et al. 2023), where the model biases token selection toward a pseudo-random "green list." The detection head is a 2-layer MLP operating on the CLS embedding, producing a probability that watermarking artifacts are present.
Combining Signals
All detection signals are fused through a combiner MLP:
combiner_input = torch.cat([
log_ratio.view(1, -1), # perplexity ratio (1)
(human_score + ai_score).view(1, -1), # magnitude (1)
struct_embed, # 32 features → 64-dim (64)
torch.tensor([[trajectory, watermark, template_score]],
device=cls.device), # (3)
], dim=-1) # total: 69-dim
ai_risk = self.combiner(combiner_input).item() # → sigmoid → [0, 1]
The perplexity ratio is the difference between learned human and AI pattern scorers operating on the CLS embedding: . Positive values indicate human-like patterns; negative values indicate AI patterns. The magnitude captures overall confidence.
The Seven-Category Taxonomy
Binary spam classification hides actionable information. The module classifies into seven categories, each requiring different handling:
| Category | Description | Action |
|---|---|---|
clean | Legitimate personalized email | Pass |
template_spam | Mass-sent templates with token substitution | Quarantine |
ai_generated | LLM-generated content | Flag for review |
low_effort | Generic, no personalization | Quarantine |
role_account | info@, noreply@, billing@ | Route to appropriate handler |
domain_suspect | Disposable/newly-registered domains | Block |
content_violation | Urgency manipulation, deceptive subject | Block |
The classification head sits on top of the document-level embedding (64-dim), concatenated with 8 information-theoretic document features projected to 48 dimensions, giving a 112-dim input to a 2-layer MLP:
cat_input = torch.cat([doc_embed, doc_feat_embed], dim=-1) # (1, 112)
category_logits = self.category_head(cat_input) # (1, 7)
The 8 Document-Level Features
| # | Feature | Extraction |
|---|---|---|
| 0 | Text length | min(word_count / 500, 1.0) |
| 1 | Link density | link_count / word_count * 100 |
| 2 | CAPS ratio | All-caps words / total words |
| 3 | Sentence count | n_sentences / 20.0 |
| 4 | Character Shannon entropy | , normalized by $\log_2 |
| 5 | Urgency word count | Raw count of urgency pattern matches |
| 6 | Template markers | Binary: {{, [[, <NAME>, [Company] present |
| 7 | Compression ratio | unique_chars / text_length (low = repetitive/templated) |
Character Shannon entropy is particularly useful for detecting image-only or heavily encoded spam. Legitimate email text has a characteristic entropy range; emails that are mostly HTML tags or base64-encoded content have distinctly different profiles.
Uncertainty Decomposition
A spam score of 0.6 is useless without knowing why the model is uncertain. The module decomposes uncertainty into two components:
Aleatoric uncertainty (inherent ambiguity): the normalized entropy of the category probability distribution. High aleatoric uncertainty means the email itself is genuinely ambiguous — it has properties of multiple categories simultaneously.
where is the number of categories and is the softmax probability of category . The denominator normalizes to : uniform distribution gives , a one-hot distribution gives .
Epistemic uncertainty (model uncertainty): the mean variance of the Beta distributions across all token posteriors. High epistemic uncertainty means the model hasn't seen enough training data similar to this email.
This is the exact variance formula for the Beta distribution, averaged across all tokens and aspects. A Beta() uniform prior has maximum variance of ; as the model becomes more confident (higher ), variance approaches zero.
# Aleatoric: category distribution entropy
cat_entropy = -(category_probs * (category_probs + 1e-8).log()).sum(dim=-1)
aleatoric = (cat_entropy / math.log(category_logits.shape[-1])).item()
# Epistemic: mean Beta variance across all tokens
beta_var = (alpha * beta) / ((alpha + beta).pow(2) * (alpha + beta + 1))
epistemic = beta_var.mean().item()
gate_confidence = max(0.0, min(1.0, 1.0 - (aleatoric + epistemic) / 2))
The combined confidence tells operators: "I'm uncertain because the email is ambiguous" (high aleatoric, low epistemic) versus "I'm uncertain because I haven't seen emails like this" (low aleatoric, high epistemic). The latter case is a signal to add more training data. The former is a signal that the email needs human review.
The Six Sub-Modules
A. HierarchicalBayesianAttentionGate
The core module described above. Computes token → sentence → document hierarchical gating with 4 aspect probes and Beta posteriors. Returns category logits, gate score, per-token contributions, per-sentence scores, aspect attributions, and uncertainty decomposition.
B. AdversarialStyleTransferDetector
The 32-feature AI detection module described above. Additionally computes:
- Perplexity ratio: Dual-head architecture with separate human-pattern and AI-pattern scorers (3-layer MLPs: 768 → 128 → 64 → 1). The log-ratio acts as a discriminative signal.
- Trajectory smoothness: Sentence-level embedding cosine similarity mean, computed on ~20 evenly-spaced projected positions.
- Watermark detection: Kirchenbauer-style green-list bias detection via learned head on CLS.
C. HeaderAnalyzer
Extracts a 16-dimensional feature vector from email headers:
[
spf_pass, spf_fail, spf_none, # one-hot (3)
dkim_pass, dkim_fail, dkim_none, # one-hot (3)
dmarc_pass, dmarc_fail, dmarc_none, # one-hot (3)
hop_count / 20.0, # normalized (1)
reply_to_mismatch, # binary (1)
return_path_mismatch, # binary (1)
has_list_unsubscribe, # binary (1)
known_mailer, # binary (1)
sin(2*pi*hour/24), # circadian encoding (1)
cos(2*pi*hour/24), # circadian encoding (1)
]
The circadian encoding captures send-time patterns without discontinuity at midnight. A sine/cosine pair maps the 24-hour cycle to a continuous 2D circle: hour 0 and hour 23 are adjacent, not 23 apart. Spam campaigns cluster at specific hours; legitimate business emails follow predictable circadian patterns per timezone.
The one-hot encoding for SPF/DKIM/DMARC (rather than a single pass/fail bit) allows the model to learn that "none" (no authentication) is a different signal than "fail" (authentication attempted and failed). Failed DKIM is more suspicious than absent DKIM.
D. TemporalBurstDetector
Analyzes cross-email send timestamps for burst patterns and cadence regularity. The 8-dimensional feature vector:
| # | Feature | Signal |
|---|---|---|
| 0 | Mean interval | Average time between emails (normalized to hours) |
| 1 | Interval variance | High variance = irregular sending; low = automated |
| 2 | Burst fraction | Proportion of intervals < 60 seconds |
| 3 | Cadence regularity | where = coefficient of variation |
| 4 | Time-of-day entropy | Entropy over 24-hour bins, normalized by |
| 5 | Day-of-week entropy | Entropy over 7-day bins, normalized by |
| 6 | Volume | min(n_emails / 100, 1.0) |
| 7 | Acceleration | Is send rate increasing? First-half vs second-half mean interval |
Ten emails in ten seconds from the same sender is a clear campaign burst (burst fraction ≈ 1.0). A perfectly regular cadence (low CV, high regularity) with low time-of-day entropy suggests an automated scheduler. Human senders have high entropy across both time-of-day and day-of-week; bots concentrate in narrow windows.
The acceleration feature detects campaigns that ramp up: if the second half of observed intervals is shorter than the first half, the sender is accelerating. This catches "start slow, then blast" patterns common in warming campaigns.
E. CampaignSimilarityDetector
Computes pairwise cosine similarity of CLS embeddings across a batch. The detection threshold: if >70% of email pairs have cosine similarity >0.85, it's a template campaign.
The cluster counting uses proper union-find with path compression (path halving, union by rank):
parent = list(range(n))
rank = [0] * n
def _find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # path halving
x = parent[x]
return x
def _union(a, b):
ra, rb = _find(a), _find(b)
if ra == rb:
return
if rank[ra] < rank[rb]:
ra, rb = rb, ra
parent[rb] = ra
if rank[ra] == rank[rb]:
rank[ra] += 1
for i in range(n):
for j in range(i + 1, n):
if sim_matrix[i, j].item() > 0.85:
_union(i, j)
cluster_ratio = len(set(_find(i) for i in range(n))) / max(n, 1)
The 4-dimensional output: max pairwise similarity, mean pairwise similarity, fraction above threshold, and cluster ratio (number of distinct clusters / total emails). A cluster ratio near 1.0 means all emails are unique; near means they all merged into a single cluster.
F. ProviderCalibration
Six provider-specific MLPs (Gmail, Outlook, Yahoo, ProtonMail, Apple Mail, Corporate) each take 10 features and produce calibrated deliverability scores:
# 10-dim input per provider
feat_vec = [
spam_score, ai_risk, text_length_norm, link_density,
urgency_count_norm, header_auth_score, template_marker,
caps_ratio, sentence_count_norm, role_account,
]
Each provider MLP has the architecture: Linear(10, 32) → GELU → Dropout(0.1) → Linear(32, 16) → ReLU → Linear(16, 1) → Sigmoid.
Rule-based adjustments are applied on top of the learned scores using empirical provider thresholds:
| Provider | Base | Link Penalty | Urgency Penalty |
|---|---|---|---|
| Gmail | 0.45 | 0.08 | 0.12 |
| Outlook | 0.40 | 0.10 | 0.10 |
| Yahoo | 0.50 | 0.06 | 0.15 |
| ProtonMail | 0.35 | 0.12 | 0.08 |
| Apple Mail | 0.42 | 0.07 | 0.11 |
| Corporate | 0.38 | 0.10 | 0.10 |
ProtonMail has the lowest base threshold (most aggressive filtering) and highest link penalty — consistent with its privacy-focused positioning. Yahoo has the highest base threshold (most lenient) but the highest urgency penalty. Gmail sits in the middle but penalizes urgency more than links.
An adversarial discriminator forces the predicted scores to match empirical inbox placement distributions. The discriminator takes a (score, real/fake) pair and learns to distinguish model-predicted deliverability from ground-truth measurements. The generator loss pushes the provider MLPs to produce scores the discriminator cannot distinguish from real data:
d_loss = BCE(D(real_score, 1), ones) + BCE(D(pred_score, 0), zeros)
g_loss = BCE(D(pred_score, 0), ones) # fool the discriminator
Residual Gate Decision
The final spam score comes from a residual MLP that fuses all six sub-module outputs:
self.gate_norm = nn.LayerNorm(7)
self.gate_trunk = nn.Sequential(
nn.Linear(7, 64), nn.GELU(), nn.Dropout(0.1),
nn.Linear(64, 32), nn.GELU(),
)
self.gate_residual = nn.Linear(7, 32) # skip connection
self.gate_out = nn.Sequential(nn.LayerNorm(32), nn.Linear(32, 1), nn.Sigmoid())
The seven inputs are:
| # | Signal | Source |
|---|---|---|
| 0 | Base gate score | HierarchicalBayesianAttentionGate |
| 1 | AI risk | AdversarialStyleTransferDetector |
| 2 | Header auth score | HeaderAnalyzer (mean of SPF/DKIM/DMARC pass signals) |
| 3 | Temporal anomaly | TemporalBurstDetector (burst fraction) |
| 4 | Campaign similarity | CampaignSimilarityDetector |
| 5 | Role account indicator | Address prefix matching |
| 6 | Urgency count (norm) | urgency_count / 5.0 |
The skip connection prevents gradient degradation when training with the multi-task loss, and the layer normalization stabilizes the heterogeneous input scales (some signals are probabilities in , others are normalized counts). The residual path gate_residual(x) provides a direct linear shortcut from raw signals to the output, so the trunk only needs to learn corrections to the linear baseline.
Gate Decisions and Risk Levels
The final spam score maps to a three-tier gate:
score < 0.3 → Pass risk: low
score < 0.5 → Quarantine risk: medium
score < 0.7 → Quarantine risk: high
score ≥ 0.7 → Block risk: critical
Risk Factor Attribution
Nine named risk factors are detected and reported with severity scores:
| Risk Factor | Trigger | Severity |
|---|---|---|
urgency_manipulation | ≥ 2 urgency words | count / 5.0 |
link_overload | > 3 links | count / 10.0 |
url_shortener | Any shortened URL | count / 3.0 |
encoding_tricks | Template markers detected | 0.5 |
homoglyph_attack | Unicode anomaly > 0.01 | Anomaly score |
reply_to_mismatch | Reply-To ≠ From domain | 0.8 |
image_only | Mostly image tags | Image ratio |
invisible_text | Zero-width characters | Char count |
zero_width_chars | \u200B, \u200C, \u200D, \uFEFF | Char count |
Multi-Task Loss with Uncertainty Weighting
Training uses the Kendall et al. (2018) uncertainty-weighted multi-task loss. Each task has a learned log-variance parameter that automatically scales its contribution:
The precision weights each task's loss. The regularizer prevents all precisions from going to infinity (which would minimize loss trivially).
prec_cat = torch.exp(-self.log_var_cat) # learned precision
prec_gate = torch.exp(-self.log_var_gate)
prec_ai = torch.exp(-self.log_var_ai)
loss_cat = prec_cat * F.cross_entropy(category_logits, true_category) + self.log_var_cat
loss_gate = prec_gate * F.binary_cross_entropy(gate_score, true_is_spam) + self.log_var_gate
loss_ai = prec_ai * F.binary_cross_entropy(ai_risk, true_is_ai) + self.log_var_ai
The five loss components:
- Category cross-entropy (7-way classification): The primary classification objective
- Gate BCE (binary spam/not-spam): The gating decision
- AI detection BCE (binary AI/human): The content authenticity signal
- KL regularization (Beta posteriors vs uniform prior): Prevents posterior collapse
where is the Beta function and is the digamma function. This regularizer pulls the learned posteriors toward the uniform Beta(1,1) prior, preventing overconfident token-level predictions. The KL weight is 0.01 — light regularization that allows the model to deviate from the prior when the data supports it.
- Adversarial calibration (provider discriminator + generator): Activated after epoch 3 warmup with weight 0.1, giving the classification heads time to converge before the adversarial signal introduces instability.
The Rust Distillation Path
The DeBERTa model is too expensive for production gating at scale. The distillation pipeline converts the neural classifier into a 24-feature logistic regression that runs in pure Rust with zero ML dependencies.
Distillation Pipeline
The distillation process:
- Data export (
export_spam_data.py, 498 lines): Fetches emails from Neon PostgreSQL (contact_emails, received_emails tables), labels via heuristic rules based on personalization scores, template IDs, keyword density, and word count thresholds - Soft label generation: Runs the DeBERTa SpamHead on all emails to produce 7-class probability distributions
- Feature extraction: Computes the same 24-element feature vector used by the Rust classifier
- One-vs-Rest training: Fits 7 independent logistic regressions using SGD with L2 regularization (), 500 epochs, learning rate 0.1
- Weight export: Serializes as JSON matching the Rust
SpamClassifierstruct format
def train_logistic(X, y, epochs=500, lr=0.1):
w = np.zeros(d, dtype=np.float64)
b = 0.0
for epoch in range(epochs):
z = X @ w + b
pred = 1.0 / (1.0 + np.exp(-np.clip(z, -30, 30)))
error = pred - y
grad_w = (X.T @ error) / n + 0.01 * w # L2 regularization
grad_b = error.mean()
w -= lr * grad_w
b -= lr * grad_b
return w.astype(np.float32), float(b)
Feature Extraction (24 dimensions)
The Rust extract_spam_features() function mirrors the Python feature set using zero-copy byte scanning:
| # | Feature | Extraction Method |
|---|---|---|
| 0-1 | Spam/urgency keyword density | Keyword match count / word count (29 spam keywords, 14 urgency keywords) |
| 2-4 | Link count, URL shorteners, image tags | Byte-level pattern scanning for http://, bit.ly, <img |
| 5-6 | Exclamation density, ALL CAPS ratio | text.bytes().filter(|&b| b == b'!') / Character classification |
| 7-9 | Sentence length variance, pronouns, contractions | Split-and-count with variance normalization |
| 10-13 | Type-token ratio, word length, starter variety, text length | HashSet for unique words, normalized to [0,1] |
| 14-17 | Unicode anomalies, homoglyphs, zero-width chars, template markers | Codepoint scanning (6 Cyrillic homoglyphs, 4 zero-width chars) |
| 18-20 | SPF+DKIM+DMARC composite, reply-to mismatch, hop count | Parsed headers via EmailHeaders struct |
| 21-22 | Send hour sine/cosine | (hour / 24.0 * TAU).sin(), .cos() |
| 23 | Role account indicator | Prefix matching against 15 role account patterns |
Rust Classifier
The distilled model is a simple struct with 7 weight vectors and 7 biases:
pub struct SpamClassifier {
pub weights: Vec<[f32; 24]>, // 7 classifiers × 24 features
pub biases: [f32; 7],
pub trained: bool,
}
impl SpamClassifier {
pub fn classify(&self, features: &[f32; 24]) -> [f32; 7] {
let mut scores = [0.0f32; 7];
for i in 0..7 {
let mut z = self.biases[i];
for j in 0..24 {
z += self.weights[i][j] * features[j];
}
scores[i] = sigmoid(z);
}
scores
}
}
The spam score is 1.0 - clean_score (index 0 is "clean"). The dominant category is the argmax. Gate thresholds: Pass (score below 0.3), Quarantine (0.3 to 0.7), Block (above 0.7).
Batch Processing (SoA Layout)
The SpamBatch struct uses Structure-of-Arrays layout with 64-byte cache alignment for optimal auto-vectorization on ARM NEON and x86 SSE:
#[repr(C, align(64))]
pub struct SpamBatch {
pub features: [[f32; 24]; 256],
pub spam_scores: [f32; 256],
pub category_idx: [u8; 256],
pub gate_decisions: [u8; 256],
pub count: usize,
}
Why 64-byte alignment? Modern CPUs load data in cache lines of 64 bytes. When the feature array starts at a cache-line boundary, sequential access never straddles two cache lines, and the compiler can emit aligned NEON/SSE load instructions (LDR Q / MOVAPS) instead of unaligned ones. The inner loop of classify() — a dot product of 24 f32 values — fits in 96 bytes (1.5 cache lines), and the alignment ensures the first cache line is loaded without penalty.
A batch of 256 emails is scored in a single pass with ergonomic batch building:
// Build batch
let mut batch = SpamBatch::new();
for email in &emails {
batch.push(&email.text, Some(&email.headers));
}
// Score all
batch.compute_scores(&classifier);
// Analytics
let pass_rate = batch.pass_rate(0.3);
let distribution = batch.category_distribution();
let clean_indices = batch.passed_indices(0.3);
Domain Filtering (Bloom Filter)
Before feature extraction even runs, a Bloom filter checks the sender domain against known spam and disposable email provider lists. The filter uses double hashing with AHash:
fn double_hash(item: &[u8]) -> (u64, u64) {
let mut h1 = AHasher::default();
item.hash(&mut h1);
let hash1 = h1.finish();
let mut h2 = AHasher::default();
hash1.hash(&mut h2);
item.hash(&mut h2);
let hash2 = h2.finish();
(hash1, hash2)
}
// Combined hash for k-th probe: h1 + k*h2
fn combined_hash(h1: u64, h2: u64, i: u32) -> u64 {
h1.wrapping_add((i as u64).wrapping_mul(h2))
}
Optimal sizing follows the standard formulas:
where is bit count, is expected capacity, and is target false positive rate. For 1,000 domains at (0.1% FPR), this gives bits (1.8 KB) and hash functions.
Two separate Bloom filters: one for 15 known spam domains (spammer.com, phish-bait.com, etc.) and one for 30 disposable email providers (mailinator.com, guerrillamail.com, tempmail.com, etc.). The check returns a DomainVerdict: Clean, KnownSpam, or Disposable.
Zero-Copy Header FSM
The header_fsm.rs module (541 lines) parses raw email headers in a single pass with zero heap allocation. A finite state machine processes byte-by-byte, with length-based field name dispatch for performance:
fn identify_field(raw: &[u8], start: usize, end: usize) -> CurrentField {
match end - start {
4 => if starts_with_icase(raw, start, b"From") { From }
8 => { /* Reply-To, Received, X-Mailer */ }
11 => { /* Return-Path */ }
12 => { /* Content-Type */ }
14 => { /* DKIM-Signature */ }
16 => { /* List-Unsubscribe */ }
22 => { /* Authentication-Results */ }
_ => Unknown,
}
}
The parser handles folded headers (continuation lines starting with space/tab), case-insensitive field matching, and both \n and \r\n line endings. The result borrows directly from the input buffer:
pub struct ParsedHeaders<'a> {
pub spf_result: AuthResult,
pub dkim_result: AuthResult,
pub dmarc_result: AuthResult,
pub from_domain: &'a str, // borrows from input
pub reply_to_domain: Option<&'a str>,
pub return_path_domain: Option<&'a str>,
pub received_count: u8,
pub has_list_unsubscribe: bool,
pub x_mailer: Option<&'a str>,
pub dkim_domain: Option<&'a str>,
pub content_type: Option<&'a str>,
pub is_multipart: bool,
}
Authentication-Results parsing extracts SPF, DKIM, and DMARC results from the value field by scanning for spf=, dkim=, dmarc= substrings and classifying the result token as Pass, Fail, SoftFail, or None. DKIM-Signature parsing extracts the signing domain from the d= field. Content-Type parsing detects multipart/* MIME types.
The zero-copy design means parsing a 2 KB header block involves no allocations — the ParsedHeaders struct is 128 bytes on the stack, and all string slices point into the input buffer. This is critical at batch scale: parsing 256 email headers should not produce 256 separate string allocations.
Results
The module is published at v9ai/salescue-spam-v1 on the Hugging Face Hub. The output includes 17 keys:
spam_score, spam_category, category_scores, ai_risk, ai_details,
header_verdict, deliverability, provider, provider_scores, risk_level,
risk_factors, token_spam_contributions, sentence_scores, gate_decision,
gate_confidence, aspect_scores, uncertainty
The full system — 1,352 lines of Python (6 sub-modules), 843 lines of Rust (classifier + batch + domain filter), 541 lines of Rust (header FSM) — runs the DeBERTa model on CPU in under 200ms per email for training and evaluation. The distilled Rust classifier processes a batch of 256 emails in under 1ms.
The key insight is that spam gating is not a classification problem — it is a resource allocation problem. Every false negative costs downstream compute. Every false positive costs a missed lead. The Bayesian uncertainty decomposition lets you tune this tradeoff explicitly: route high-epistemic-uncertainty emails to human review instead of auto-blocking them, and auto-block only when aleatoric uncertainty is low and the spam score is high.
The decision matrix:
| Aleatoric | Epistemic | Action |
|---|---|---|
| Low | Low | Trust the score — auto-pass or auto-block |
| High | Low | Genuinely ambiguous email — quarantine for human review |
| Low | High | Model hasn't seen this pattern — add to training data |
| High | High | Unknown and ambiguous — escalate immediately |
The model, weights, and distillation pipeline are open source. The next step is calibrating the provider-specific models against real inbox placement data from Resend delivery webhooks.
End-to-End Gate Flow
The complete gating pipeline from email ingestion to final verdict:
In production, the Rust path handles all traffic at batch speed. The DeBERTa path runs offline for training, evaluation, and generating soft labels for distillation. The Bloom filter short-circuits known spam domains before any feature extraction runs — the cheapest check first.
FAQ
What is the main advantage of multi-probe attention over a single attention head?
A single probe compresses all spam signals — keyword spam, AI-generated content, urgency manipulation, template structure — into one attention distribution. Multi-probe attention lets each aspect specialize independently. The learned gating mechanism automatically upweights whichever aspect is most relevant for a given email, and the per-aspect attribution enables interpretable false positive investigation.
Can this system detect AI-generated emails that have been paraphrased?
The 32-feature AI detection subsystem is designed to be robust against paraphrasing. Surface-level features (keyword matching) are easily defeated, but information-theoretic features — Yule's K, Shannon word entropy, trajectory smoothness, n-gram repetition patterns — capture statistical properties of the word frequency distribution that survive paraphrasing. The watermark detection head catches Kirchenbauer-style green-list bias when present.
Why distill to logistic regression instead of a smaller neural network?
Logistic regression has zero ML dependencies in Rust — no ONNX runtime, no tensor library, no BLAS. The 24-feature dot product runs in a tight loop that auto-vectorizes on ARM NEON and x86 SSE. At batch scale (256 emails), the entire scoring pass completes in under 1ms. A small neural network would require matrix multiplication infrastructure that adds complexity without proportional accuracy gain for the gating task.
How does the system handle emails it has never seen before?
Epistemic uncertainty — the mean Beta variance across token posteriors — directly measures model uncertainty about unfamiliar patterns. High epistemic uncertainty triggers quarantine rather than auto-blocking, and flags the email for human review and potential addition to the training set. This is fundamentally different from a confidence threshold on a point estimate, which cannot distinguish "I'm unsure because the email is ambiguous" from "I'm unsure because I haven't seen this pattern."
Does the system need retraining as spam tactics evolve?
Yes. The Bayesian priors adapt during training but not at inference time. The distillation pipeline (export → soft labels → logistic regression → Rust weights) is designed to run periodically as new labeled data accumulates. The temporal burst detector and campaign similarity detector provide cross-email signals that help detect novel campaign patterns even before retraining.
References
-
Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization. Foundational paper on applying Naive Bayes classifiers to spam filtering.
-
Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. CVPR. The uncertainty-weighted multi-task loss used in training.
-
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. ICML. The green-list watermarking scheme detected by the watermark head.
-
Yule, G. U. (1944). The Statistical Study of Literary Vocabulary. Cambridge University Press. Origin of Yule's K characteristic for vocabulary richness measurement.
-
Honore, A. (1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin, 7(2). The R statistic for length-independent vocabulary diversity.
