A toy example of a Kafka to OpenSearch / ElasticSearch pipeline for WikiMedia data written in pure FP Scala 3 with Cats Effect, fs2-kafka and opensearch-java client.
It consists of:
producer
: a Kafka producer that reads from the WikiMedia event stream and writes to a kafka topicconsumer
: a Kafka consumer that reads from a Kafka topic and writes to OpenSearch
- Install scala-cli
- Start VM (optional)
# MacOS only, not needed if you have Docker Desktop or similar ./colima.sh
- Run local infra.
docker-compose -f ./docker-compose.yml up
- Run the app
# start producer process scala-cli ./sampler -- produce # start consumer process scala-cli ./sampler -- consume # run the producer & consumer processes concurrently scala-cli ./sampler -- produce-consume # for more options scala-cli ./sampler -- help
- You can inspect processed data in:
- Kafka UI available at http://localhost:8080
- OpenSearch-Dashboards / Kibana console available at http://localhost:5601/app/dev_tools#/console
- parametrize more OpenSearch client options and move them to the CLI level
- use Avro or other format instead of JSON
- kafka consumer graceful shutdown
- use explicit mapping instead of dynamic one in OpenSearch
- create Grafana / Kibana dashboards
- add more tests