Walk through submission, preflight, run tracking, and diagnostics with your team and workload shape.
Request pilotGovern Spark runs on AWS
without building a control plane from scratch.
SparkPilot is built for platform teams running Spark on AWS. You get preflight safety checks, run diagnostics, and cost visibility in one place while keeping infrastructure and data inside your AWS account.
What you can review before rollout
Start with a live demo, then use pilot artifacts to align technical, security, and buyer stakeholders.
Redacted screenshots and run summaries are shared during active pilot evaluations.
Request pilot evidence packShort onboarding and run-operations clips are coming soon.
See upcoming features in your pilot callWhat teams replace in week one
Platform teams running Spark on shared EKS clusters hit the same operational bottlenecks. SparkPilot replaces manual run prep with a governed workflow.
Core capabilities for pilot and rollout
These are the capabilities teams use first. Each capability includes an availability label so teams can plan rollout clearly.
Preflight Safety Gates
IAM, IRSA, OIDC, resource quota, and Spot capacity checks run before a single byte moves. Lake Formation permission checks are in beta when enabled. Bad configs are blocked with clear remediation steps so teams can fix issues before dispatch.
CUR-Aligned Cost Attribution
SparkPilot provides per-run cost estimates before dispatch and can reconcile against AWS Cost and Usage Report data in Athena when CUR integration is configured.
Multi-Tenant Isolation
Tenants, teams, environments, and runs are fully scoped. Each environment gets its own namespace, IRSA bindings, and resource quotas. Teams share a cluster without interference.
Governance and Audit
Role-based access is enforced across SparkPilot APIs, with team-environment scopes, budget guardrails, and audit events for key control-plane actions.
Bring Your Own Cloud
SparkPilot runs inside your AWS account. Your VPC, S3 buckets, and IAM policies stay under your control. BYOC-Lite is designed for fast connection to an existing EKS cluster, depending on IAM and OIDC readiness.
Runtime Management
Three background workers, Scheduler, Reconciler, and Provisioner, manage the run lifecycle. SparkPilot dispatches queued jobs to AWS, tracks state transitions, and links each run to its exact logs. You track runs in a dashboard, not a CloudWatch stream.
Structured Diagnostics
When a run fails, SparkPilot classifies the cause such as OOM kill, Spot interruption, S3 access denied, timeout, or user error. Engineers get a clear starting point for remediation.
Guided Onboarding
A step-by-step wizard validates cross-account trust, OIDC federation, namespace prerequisites, and execution role bindings, with actionable guidance for misconfigurations.
How SparkPilot handles run operations
Teams submit through API, CLI, Airflow, or Dagster. SparkPilot manages dispatch, state reconciliation, and diagnostics so operators do not stitch together raw AWS calls.
Polls for queued runs and dispatches them to AWS EMR, EMR Serverless, or EMR on EC2. Manages concurrency limits and environment-level queueing.
Continuously polls EMR for job state changes and writes structured transitions from accepted to running to succeeded or failed. Detects stalled runs and triggers timeout handling.
Manages environment lifecycle, including BYOC-Lite and Full BYOC (in beta) provisioning, checkpoint recovery across Terraform stages, and environment teardown.
One control plane, four Spark runtimes
SparkPilot routes submissions to EMR on EKS today, with beta coverage for EMR Serverless and EMR on EC2. Databricks routing is planned as a coming-soon extension.
Native EMR virtual cluster on your EKS cluster for production Spark workloads.
Submit to an EMR Serverless application for fully managed capacity. No EKS cluster required.
Dispatch to existing EMR on EC2 clusters via step submission. Integrates with your current EC2-based Spark estate.
Planned support for Databricks Jobs API routing from the SparkPilot control plane.
From pilot kickoff to rollout in five steps
Define pilot scope
Align on one workload family, success criteria, and owner roles before setup starts. This keeps pilot scope clear and measurable.
Open the pilot guideConnect your AWS account
Create the cross-account IAM role and OIDC association. SparkPilot validates trust, permissions, and namespace prerequisites with clear remediation steps.
Choose deployment model
BYOC-Lite connects to your existing EKS cluster quickly. Full BYOC is in beta for teams that need VPC, EKS, and EMR provisioning from Terraform modules.
Submit your first governed run
Encode submission patterns as versioned templates, including Spot configurations, Graviton instance preferences, S3 Express paths, container images, and Spark configuration baselines.
Review outcomes and decide rollout
Compare pilot results against your success criteria, including reliability, diagnostics, and cost visibility. Then move to production rollout with the same control plane.
Use SparkPilot from orchestrators, terminal, or API
SparkPilot supports workflow engines and engineer-first interfaces, so teams can adopt it through existing DAGs, CI pipelines, and terminal-driven operations.
SparkPilotSubmitRunOperator with full deferrable trigger support. Drop into any existing DAG - sync or async.
Native @asset definitions and ops for run submission, polling, and cancellation. Works with Dagster Cloud and OSS.
Engineers can submit, inspect, cancel, and tail runs from terminal workflows without opening the dashboard.
Teams can integrate SparkPilot into internal portals and automation jobs through authenticated REST endpoints.
What you don't get with DIY or EMR Serverless
DIY gives you primitives. EMR Serverless removes cluster management. Neither gives you a multi-tenant control plane with built-in governance.
| Capability | DIY on AWS | EMR Serverless | SparkPilot |
|---|---|---|---|
| Preflight IAM/OIDC validation | |||
| Multi-tenant namespace isolation on EKS | |||
| CUR-aligned cost attribution per team | |||
| Budget guardrails with hard-block | |||
| Spot diversification validation at preflight | |||
| Airflow and Dagster native integrations | |||
| Kubernetes-native control plane | |||
| No infra management required |
This table shows what SparkPilot adds beyond base AWS primitives. DIY rows reflect capabilities you can build yourself, while SparkPilot ships them configured and enforced.
Common questions, honest answers
We outline tradeoffs so your team can choose the right path.
Why not build it yourself?
130 to 250 hours to reach parity. 40 to 80 hours of ongoing maintenance per month. An honest cost accounting of DIY EMR on EKS.
Read the breakdownWhy not EMR Serverless?
Cold-start latency, no persistent clusters, no YuniKorn, no BYOC. When Serverless is the right answer, and when it is not.
Read the tradeoffsStart with a guided Spark pilot
Share your workload profile and we will map a practical pilot plan with clear success criteria, owner responsibilities, and rollout options.