Universal AutoML Pipeline

An autonomous MLOps agent powered by Kestra & Docker.

⚡ Kestra Orchestration 🐳 Docker Isolated 🐍 Python 3.11 🤖 Scikit-Learn
View on GitHub

Live Code from Repository

1. Ingest etl/download.py

Accepts any CSV URL and streams it securely, bypassing bot protections.

Loading code from GitHub...

2. Clean & Encode etl/features.py

Auto-detects target columns, cleans data, and performs fuzzy matching on inputs.

Loading code from GitHub...

3. Train etl/train.py

Trains a Random Forest Classifier and outputs JSON metrics for Discord.

Loading code from GitHub...

The Orchestration Logic (YAML)

This pipeline connects the Python scripts above into a cohesive workflow.

id: universal_automl_pipeline
namespace: buffalo
inputs:
  - id: dataset_url
    type: STRING
    defaults: "https://data.buffalony.gov/api/views/n4ni-uuec/rows.csv?accessType=DOWNLOAD"
  - id: target_column
    type: STRING
    defaults: "Tree Health"

tasks:
  - id: download_generic
    type: io.kestra.plugin.scripts.shell.Commands
    containerImage: python:3.11-slim
    commands:
      - python etl/download.py "{{ inputs.dataset_url }}" "raw.csv"

  - id: auto_clean_features
    type: io.kestra.plugin.scripts.shell.Commands
    commands:
      - python etl/features.py "raw.csv" "clean.csv" "{{ inputs.target_column }}"

  - id: auto_train_model
    type: io.kestra.plugin.scripts.shell.Commands
    commands:
      - python etl/train.py "clean.csv" "metrics.txt"

  - id: notify_discord
    type: io.kestra.plugin.notifications.discord.DiscordIncomingWebhook
    payload: "{{ read(outputs.auto_train_model.outputFiles['metrics.txt']) }}"