Skip to main content

19 posts tagged with "machine learning"

View All Tags

Multi-Probe Bayesian Spam Gating: Filtering Junk Before Spending Compute

· 44 min read
Vadim Nicolai
Senior Software Engineer

In a B2B lead generation pipeline, every email that arrives costs compute. Scoring it for buyer intent, extracting entities, predicting reply probability, matching it against your ideal customer profile — each module is a DeBERTa forward pass. If 40% of inbound email is template spam, AI-generated slop, or mass-sent campaigns, you are burning 40% of your GPU budget on garbage.

The solution is a gating module: a spam classifier that sits at stage 2 of the pipeline and filters junk before anything else runs. But a binary spam/not-spam classifier is too blunt. You need to know why something is spam (template? AI-generated? role account?), how confident you are (is it ambiguous, or have you never seen this pattern before?), and which provider will block it (Gmail is stricter than Yahoo on link density).

This article documents a hierarchical Bayesian spam gating system with 4 aspect-specific attention probes, information-theoretic AI detection features, uncertainty decomposition, and a full Rust distillation path. The Python model trains on DeBERTa-v3-base. The Rust classifier runs at batch speed with 24 features and zero ML dependencies.

Building a ZoomInfo Alternative with Qwen and MLX: Local Buyer Intent Detection

· 11 min read
Vadim Nicolai
Senior Software Engineer

ZoomInfo charges $300+ per user per month for intent data — buying signals that tell sales teams which companies are actively in-market. It is the platform's number one feature and the reason enterprises pay six figures annually for access. But the underlying technology — classifying company content into intent categories — is a text classification problem. One that a 3-billion-parameter open-source model can solve on a single laptop.

From Research Papers to Production: ML Features Powering a Crypto Scalping Engine

· 33 min read
Vadim Nicolai
Senior Software Engineer

Every feature in a production trading system has an origin story — a paper, a theorem, a decades-old insight from probability theory or market microstructure. This post catalogs 14 ML features implemented in a Rust crypto scalping engine, traces each back to its foundational research, shows the actual formulas, and includes real production code. The engine processes limit order book (LOB) snapshots, trade ticks, and funding rate data in real time to generate scalping signals for crypto perpetual futures.

Understanding Score IC in Qlib for Enhanced Profit

· 7 min read
Vadim Nicolai
Senior Software Engineer

Introduction

One of the core ideas in quantitative finance is that model predictions—often called “scores”—can be mapped to expected returns on an instrument. In Qlib, these scores are evaluated using metrics like the Information Coefficient (IC) and Rank IC to show how well the scores predict future returns. Essentially, the higher the score, the more profit the instruments—if your IC is positive and statistically significant, the highest-scored stocks should, on average, outperform the lower-scored ones.