The Developer Who Deleted Production on Their First Day

I read a few days ago the story of a developer who joined a company as a Junior Software Developer. On their first day at the company they had to set up the development environment, so their superiors sent them a document detailing how to create it. The process was fairly simple: run a script that would, among other things, create a copy of the database with test data. After creating this database, it was necessary to run the tests to verify that everything was correctly configured.

The problem came when this developer used the production database credentials instead of those of their local copy. When running the tests, they filled the database with test data and deleted what was already there. Conclusion: all production data was deleted and the developer was fired.

The response from the company and its CTO was immediate. The employee was fired almost instantly, despite having offered to help. They were even threatened with legal action against them due to the severity of the problem. But was this behavior by the company justified?

The Community Speaks: Gitlab and Amazon#

This is not the first time a company has lost data due to human error. And it is not something alien to large companies either. Something similar happened to Gitlab in February of that year when someone accidentally deleted about 6 hours of data from their databases. Gitlab's estimate was that they lost data from around 5,000 projects, 5,000 comments, and 700 new accounts.

Amazon has also been one of the companies to suffer this problem. In December 2012, a developer who had access to the production environment deleted data from the state of the ELB (Amazon Elastic Load Balancer) service while performing maintenance tasks.

The Gitlab engineer who deleted this data (yorickpeterse on Reddit) made an excellent comment on the Reddit post arguing that the blame was not actually the developer's, or at least not entirely.

While it is true that they were the one who deleted the data, it should also be noted that the company seemed to lack clear protocols and procedures for their database:

The company sent a document with production credentials to a person on their first day.
The company did not create a read-only user for this newcomer; instead they provided administrator access to the entire database, which is illogical since they only needed to read the database, not write to it.
The company creates development environments based on the production database, instead of having a clone of the production database ready for development.
The company's CTO blames the new developer, rather than being concerned about preventing this from happening again. In fact, it could happen again with a developer who was already on staff simply because "we are human."
The tools used in the copy process did not check that the database they were targeting was the correct one.
The company did not assign anyone to guide that person on their first day: just a document with production data and left to fend for themselves.
The company had no backups; if they had, it would have been as simple as restoring everything after the incident.

If an Intern Can Break Production on Day One, Your Business Is in Trouble#

If a person joins and on their first day can accidentally delete an entire database, this means your development process is not well thought out. It does not seem logical to give full write access, or to allow new application versions to be deployed, to a person who has been at the company for 1 month. Not knowing the processes makes it very easy to make mistakes. And this is something a company whose business is based on software must be prepared for.

Something as simple as restricting what each person can do with environments, data, and code can save a lot of trouble. And in case it does happen, there should also be a plan B to avoid panic.

There is no golden rule to prevent these problems from occurring. We have all had moments where we made a mistake and deleted the wrong folder, or overwrote data in the database because we forgot to add a condition to the query.

Is it a problem? Yes. Is it the end of the world? Not at all. What needs to be done to avoid this is to maintain backups of everything that is critical to the business (code and data, for example). It is also important to establish a protocol for when a new development environment needs to be created, whether for a new team member or simply because the machine has been formatted and everything needs to be reinstalled. And most importantly: you need a plan in case something goes wrong. Because in this world of software development, as they say: shit happens.

The Community Speaks: Gitlab and Amazon#

If an Intern Can Break Production on Day One, Your Business Is in Trouble#

Related Links#