Do You Need an ETL Tool? A Decision Framework
One of the first questions organizations face when building a data stack is whether to invest in a dedicated ETL tool or build pipelines themselves. After helping multiple organizations make this decision - and using both approaches hands-on - here's how to think about it.
The Build vs Buy Spectrum
There's no single right answer. The choice depends on your data sources, team capabilities, and how much you value time versus money.
When Custom Scripts Make Sense
You have few data sources. If you're pulling from 2-3 APIs and loading into a warehouse, a Python script might be all you need. Don't over-engineer a simple problem.
Your sources are unusual. Managed tools excel at common integrations - Salesforce, Stripe, Google Analytics. If your data comes from proprietary systems, internal APIs, or legacy databases with custom schemas, you may need custom code regardless.
You have strong engineering resources. A team that's comfortable maintaining data pipelines can build exactly what they need without the constraints of pre-built connectors.
Budget is extremely tight. Managed ETL tools cost money. For early-stage startups, a well-written script that runs on a cron job might be the right choice until you can justify the investment.
When Managed Tools Make Sense
You have many SaaS data sources. Once you're pulling from 10+ sources - marketing platforms, payment systems, CRM, support tools - maintaining custom connectors becomes a full-time job. Tools like Fivetran have pre-built connectors that handle API changes, rate limits, and pagination.
Schema changes break your pipelines. Third-party APIs change without warning. Managed tools handle these changes - that's literally what you're paying for. One unexpected Shopify API update at 2 AM is enough to make the subscription feel worthwhile.
Your team should focus elsewhere. Every hour spent maintaining data pipelines is an hour not spent on analysis or building data products. If pipeline maintenance is distracting from higher-value work, it's time to outsource it.
You need reliability guarantees. Managed tools offer SLAs, monitoring, alerting, and support. For business-critical data pipelines, this matters.
The Hidden Costs of "Free"
Building your own looks cheaper on paper. No subscription fees. Just developer time.
But consider the true costs:
Initial development: Writing a robust connector takes longer than you think. Error handling, retry logic, incremental loads, schema detection - it adds up.
Ongoing maintenance: APIs change. Rate limits shift. Authentication methods evolve. Each change requires developer attention.
Monitoring and alerting: You need to know when pipelines fail. Building observability is work.
Documentation: When the person who built it leaves, can someone else maintain it?
Opportunity cost: What else could your team build with that time?
I've seen teams spend 20+ hours per month maintaining "free" custom pipelines. At a reasonable engineering hourly rate, that exceeds the cost of most managed tools.
The Hybrid Approach
Most mature data teams end up with a hybrid approach:
Managed tools for common sources. Use Fivetran or Airbyte for standard SaaS integrations. These are solved problems - don't re-solve them.
Custom code for unique sources. Build connectors for internal systems, proprietary databases, or unusual APIs where no pre-built option exists.
Orchestration to tie it together. Tools like Airflow or Dagster coordinate both managed and custom pipelines in a unified workflow.
Decision Framework
Ask yourself these questions:
1. How many data sources do you have? Under 5 with simple APIs? Custom might work. Over 10? Strongly consider managed.
2. How often do source APIs change? Frequently changing APIs favor managed tools with dedicated connector maintenance.
3. What's your team's capacity? If you're stretched thin, buy time by buying tools.
4. How critical is data freshness? If delays matter, the reliability of managed tools may justify the cost.
5. What's your growth trajectory? Starting with 3 sources but planning for 20? Start with a tool that scales.
My Recommendation
For most organizations building a modern data stack, start with a managed ETL tool for your SaaS sources. The time savings are real, and you can always build custom connectors for specific needs later.
The tools have matured significantly. Fivetran pioneered the space, but options like Airbyte (open-source with a managed offering) provide flexibility if you're cost-conscious or need more control.
Don't let "we can build it ourselves" become a trap that consumes engineering capacity better spent elsewhere.
Understanding ETL is just one piece. Learn about choosing between platforms like Databricks and Snowflake.