Hybrid News Sentiment Engine:
Real-Time Market Analysis via Adaptive Ensemble Learning

Hybrid News Sentiment Engine cover

We present a hybrid news sentiment engine that continuously learns market sentiment from paired news headlines and concurrent asset-price snapshots — without requiring any neural network training or GPU compute. The system runs on a 15-minute polling cycle from the Tradeflags NewsFeed API, which provides 22 price-snapshot fields per news item spanning equity indices, commodities, and cryptocurrencies.

Three-Way Ensemble Architecture

1. Financial Lexicon (FinBERT-style)
A domain-adapted lexicon of 248 financial terms with loss-aversion weighting. Runs in ~0.1ms per headline on a single CPU core. Free, no dependencies.

2. Adaptive Statistical Cluster Learner
The core innovation: headlines are vectorized via TF-IDF and grouped into semantic neighborhoods using greedy incremental clustering. Each cluster tracks the rolling average realized price reaction for that type of news. When market regimes shift, cluster centroids drift automatically — no retraining needed.

3. Auto-Calibrating Ensemble
Every 6 hours, the system compares each signal's predicted sentiment against the actual price move. The Spearman correlation drives optimal weight reallocation: whichever signal best predicted recent moves gets higher weight. Currently 45% statistical, 20% lexicon, 35% LLM (optional).

Key Results

Zero marginal cost: The entire pipeline runs on a single CPU server at sub-2-second latency per batch. Compare to GPT-4 ($30K-90K per million headlines) or FinBERT (GPU required).

Self-improving: The TF-IDF cluster learner adapts to new market regimes automatically. When "rate hike" headlines start producing different reactions than six months ago, the cluster's average price response drifts with each new data point.

Live deployment: The engine runs as a cron pipeline polling every 15 minutes, with an interactive HTML gauge at tradeflags.com showing live aggregate sentiment, signal breakdown, and cross-asset snapshots.

Novel comparison: Surveying 6 existing approaches (FinBERT, GPT-4, FinLlama, VADER, Bloomberg Sentiment, Alpha Vantage) across 9 dimensions, our system is the only one that simultaneously achieves zero cost, CPU-only inference, regime adaptability, and price-based calibration.

Showcase

Live Gauge at www.tradeflags.com

Download the full paper (PDF):