Skip to content
Rauf

aa-news-encoder

Turkish news classification pipeline — AA API ingestion, Kafka, fine-tuned BERT, REST/SSE API, and dashboard.

April 30, 2026 · Public · Completed · nlp · turkish · bert · kafka · View on GitHub →

Overview

Proof-of-concept for classifying Turkish news articles into 7 categories using a fine-tuned dbmdz/bert-base-turkish-cased model (~87% accuracy). Articles are ingested from the Anadolu Ajansı API, processed through a Kafka pipeline, and served via FastAPI/gRPC with a NestJS REST/SSE API and TanStack React dashboard.

Tech Stack

LayerTechnology
ServicesNestJS, Python / FastAPI
MessagingKafka, gRPC, REST, SSE
StoragePostgreSQL, Redis
ModelHuggingFace Transformers, Turkish BERT
FrontendReact, TanStack Router / Table
InfrastructureDocker Compose

Architecture

AA API → Producer → Kafka → Consumer → Model (FastAPI + gRPC)
       → PostgreSQL + Redis → API (REST + SSE) → Dashboard

Key Features

  • Scheduled ingestion — polls AA News API with Redis deduplication (SHA-256, 3h TTL).
  • Fine-tuned BERT — 7-category Turkish news classifier served over FastAPI and gRPC.
  • Event-driven pipeline — producer, consumer, and model as separate services.
  • Management dashboard — magic-link auth with live SSE updates.
  • Dataset CLI — local tool for collecting training data from Turkish news sources.

Resources