aa-news-encoder
Turkish news classification pipeline — AA API ingestion, Kafka, fine-tuned BERT, REST/SSE API, and dashboard.
April 30, 2026 · Public · Completed · nlp · turkish · bert · kafka · View on GitHub →
Overview
Proof-of-concept for classifying Turkish news articles into 7 categories using a fine-tuned dbmdz/bert-base-turkish-cased model (~87% accuracy). Articles are ingested from the Anadolu Ajansı API, processed through a Kafka pipeline, and served via FastAPI/gRPC with a NestJS REST/SSE API and TanStack React dashboard.
Tech Stack
| Layer | Technology |
|---|---|
| Services | NestJS, Python / FastAPI |
| Messaging | Kafka, gRPC, REST, SSE |
| Storage | PostgreSQL, Redis |
| Model | HuggingFace Transformers, Turkish BERT |
| Frontend | React, TanStack Router / Table |
| Infrastructure | Docker Compose |
Architecture
AA API → Producer → Kafka → Consumer → Model (FastAPI + gRPC)
→ PostgreSQL + Redis → API (REST + SSE) → DashboardKey Features
- Scheduled ingestion — polls AA News API with Redis deduplication (SHA-256, 3h TTL).
- Fine-tuned BERT — 7-category Turkish news classifier served over FastAPI and gRPC.
- Event-driven pipeline — producer, consumer, and model as separate services.
- Management dashboard — magic-link auth with live SSE updates.
- Dataset CLI — local tool for collecting training data from Turkish news sources.