COE-overview

Weekly aggregator for security and operational errors from Jira, Wiz, CrowdStrike, and Vibranium.

https://github.com/davidbmar/COE-overview  ·  public  ·  shipped

What it is

A Python-based ETL pipeline that ingests error events from multiple security and operations platforms (Jira, Wiz, CrowdStrike, Vibranium), normalizes them into a PostgreSQL database, and prepares data for weekly reporting. It includes schema management via Alembic and supports Kubernetes CronJob execution patterns.

Features

Quickstart

pip install -e .
export DATABASE_URL=postgresql+asyncpg://user:pass@localhost/db
export JIRA_API_TOKEN=your_token
export WIZ_CLIENT_ID=your_id
python -m coe

Architecture

flowchart TD
    A[External Sources] -->|Ingest| B(COE Pipeline)
    B --> C[PostgreSQL Database]
    C --> D[Alembic Migrations]
    B --> E[Run ID Output]
    E --> F[Downstream Renderer]
    subgraph Sources
    A1[Jira]
    A2[Wiz]
    A3[CrowdStrike]
    A4[Vibranium]
    end
    A1 --> A
    A2 --> A
    A3 --> A
    A4 --> A

How it's built

Built with Python using SQLAlchemy (async) for database interactions, Pydantic Settings for configuration management, and Structlog for logging. It uses Alembic for database migrations and async/await patterns for concurrent data ingestion from external APIs.

How it runs

sequenceDiagram
    participant CLI as __main__.py
    participant Config as config.py
    participant DB as SQLAlchemy Engine
    participant Pipe as pipeline.run
    participant Src as External APIs
    CLI->>Config: get_settings()
    CLI->>DB: create_async_engine()
    CLI->>Pipe: run(session_factory, settings)
    Pipe->>Src: Fetch Jira Issues
    Pipe->>Src: Fetch Wiz Findings
    Pipe->>Src: Fetch CrowdStrike Detections
    Pipe->>Src: Fetch Vibranium Events
    Src-->>Pipe: Raw Data
    Pipe->>DB: Normalize & Insert Events
    DB-->>Pipe: Confirm Write
    Pipe-->>CLI: Return Result (run_id, status)
    CLI->>CLI: Write run_id to file

How to apply & reuse

Configure environment variables for source APIs (Jira, Wiz, CrowdStrike, etc.) and database connection. Run the pipeline via the CLI entrypoint to ingest data, then use the stored run_id to trigger downstream reporting or rendering processes.

At a glance

CapabilitiesMulti-source data ingestionData normalizationAsynchronous processingDatabase migration managementEnvironment-based configurationStructured logging
Componentscoe/__main__.pycoe/pipeline.pycoe/config.pycoe/db/models.pyalembic/env.pycoe/log_capture.py
TechPythonSQLAlchemyAlembicPydanticAsyncIOStructlogPostgreSQL
Depends onasyncpgpydantic-settingssqlalchemyalembicstructlog
Integrates withJiraWizCrowdStrikeVibraniumGoogle DocsKubernetes CronJobs
PatternsETL PipelineRepository PatternDependency InjectionConfiguration via EnvironmentAsync/Await Concurrency
Reuse tagssecurity-opsdata-aggregationweekly-reportingpython-etlk8s-cronjob

Repo hygiene

✓ all on main — nothing unmerged.