10 Things Great Data Engineers Do
Good vs. Bad Data Engineers - the mindset, habits, and behaviors
Hey friends - Happy Tuesday!
I’ve built 5 Data Warehouses, 2 Data Lakes, and 1 full Lakehouse system over the last 15+ years - and I can honestly say:
I’ve met every kind of data engineer you can imagine.
Today, I want to share with the things that separate elite engineers from the ones who quietly fade into the background.
Let’s dive in 👇
1. Design for Failure
Bad Engineer: Assumes the system will run perfectly forever.
Good Engineer: Assumes everything will fail.. and prepares rollback, retries, alerts, and audit trails.
Real data pipelines don’t just deliver data. They recover from disaster.
2. Build for Scale Before It Hurts
Bad Engineers: Says, “It used to run fine - last year it took 2 hours, now it takes 6. Not my fault the data grew.”
Good Engineers: Plans ahead. Knows the data will grow. Uses smart partitioning, efficient joins, scalable formats, and parallelism - long before it becomes a bottleneck.
Real pipelines keep working as data explodes.
3. Simplicity Wins
Bad Engineer: Builds the most clever pipelines.
Good Engineer: Builds the clearest & easiest pipelines. Anyone in team can extend, debug and maintain.
If your pipeline needs a 30-minute explanation, it’s not impressive - it’s fragile
4. Build for Others
Bad Engineer: Writes code only they understand. No comments. No logs. No context.
Good Engineer: adds logs for operators, comments for complex logic, and names things clearly - like someone else will debug it later. Because someone will.
The best engineers don’t build for themselves - they build so others can succeed.
5. Keep Layers Clean
Bad Engineer: Mixes everything into one giant job - ingestion, cleaning, business logic, all in one place. Or worse… spreads logic across 10 unclear steps with no structure.
Good Engineer: Builds with separation of concerns- raw data in one layer, cleaned data in another, business logic clearly isolated. Bronze, Silver, Gold.
Clean layers don’t just organize data - they make pipelines debuggable, testable, and built to last.
6. When Things Get Slow
Bad Engineer: Pipeline gets slow? “We need more clusters.” “Add more indexes.” When that doesn’t work, they blame the platform and push for a costly migration - only to find the jobs are still slow.
Good Engineer: Digs deeper. Blames their own design first. Rethinks the logic, inspects the joins, traces every step - and finds the step that breaks everything.
Because performance isn’t a platform problem - it’s a design responsibility.
7. Mindset: Metadata-Driven Thinking
Bad Engineer: Sees every new table or source as a new script, new job, new pipeline - and starts copying code.
Good Engineer: Designs for reuse. Builds pipelines driven by metadata, not hardcoded logic. New source/table? Just add a config then Plug and play.
It’s not about scaling code - it’s about scaling patterns.
8. Know How to Model Data
Bad Engineer: Just lands the data. No structure, no naming conventions, no documentation. Leaves consumers to figure it out. And if they do attempt modeling, it’s just one giant denormalized table.
Good Engineer: Integrates sources into clean models, chooses the right structure, and delivers well-documented, easy-to-use datasets.
If you can’t model data, you can’t make it usable.
9. Green Means Nothing
Bad Engineer: Sees the job status as “green” ? Perfect my job is done and everything is fine.
Good Engineer: “Green just means it didn’t crash.” Adds post-load checks for freshness, nulls, duplicates, row counts, and schema drift. They Trust nothing and they know pipelines lie! and they know if
Pipelines lie! If your data isn’t trusted, nothing you build has meaning
10. Own the End-to-End
Bad Engineer: Blames upstream. Ignores downstream. Works only on their “step.”
Good Engineer: Sees the full picture from ingestion to dashboard, from raw files to real decisions. Understands how their choices affect analysts, reports, ML models and data costs.
The best data systems work because someone cared about the whole journey
Final Thoughts
At the end of the day, it’s not about the tool you use - it’s about how you design for chaos, change, and scale.
Here’s what I try to live by - maybe it helps you too 👇
I design like everything will fail - because it will
I build simple - so others can build on top
I plan for scale - before it becomes a problem
I organize with intention - raw, clean, logic, separated
I never trust “green” - I validate the data
I think metadata-first - so new tables become config, not chaos
I model with clarity - structure, naming, and real usability
I solve bottlenecks - before blaming platforms
I think end-to-end: from source to decision
I question every load - because pipelines lie
And if you made it this far, you’re already one of the good ones.
Catch you next time,
Baraa 👋
In today’s video, we’re deep diving into conditional statements in Python—covering all the shapes and designs from simple if
to elif
, else
, nested conditions, independent ifs and an amazing two python challenges at the end.
Hey friends —
I’m Baraa. I’m an IT professional and YouTuber.
My mission is to share the knowledge I’ve gained over the years and to make working with data easier, fun, and accessible to everyone through courses that are free, simple, and easy!