Backup Sucks, Why Can’t We Move On?

“Tape Sucks, Move On” (Data Domain)

“Don’t Backup. Go Forward.” (Rubrik)

“Don’t even mention backup in our slogan” (Every other company)

Everybody hates backup — executives, users, and administrators. Even backup companies hate it (at least their slogan writers do). Organizations run backup only because they have to protect the business. I’ve met hundreds of frustrated backup customers who have tried snapshots, backup appliances, cloud, backup as a service, and scores of other “fixes”. They all ask one question –

“Why is backup so painful?!?”

Performance: “I’m Givin’ Her All She’s Got, Captain!”
Backup is painful because it is slow and there is so much data.

Companies expect the backup team to:

  1. Back up PBs of data for thousands of applications every day
  2. Not affect application performance (compute, network, and storage)
  3. Spend less on the backup infrastructure (and team)
  4. Rinse and Repeat next year with twice as much data

Everybody underestimates the cost of backups. While at EMC, a federal agency (no way I’m naming this one) complained about their backup performance. In their words, “The data trickles like an old man’s piss.” They were using less than 1% of the Data Domain’s performance. Their production environment, however, was running harder than Tom Cruise (and just as slow). When they set up their application environment, they hadn’t thought about backup. To meet their application and backup SLAs, they had to buy 4x the equipment and run backups 24 hours a day. NOTE: Unless you can pay for IT gear with tax dollars, I would not depend on that approach.

Backups run for a long time and they use a lot of resources. Teams have to balance application performance with backup SLAs across vast oceans of data. It’s an impossible balancing act. That’s why backup schedules are so complex.

Backup will be painful until we solve the performance problem. Imagine that you could make backup in an instant. You could make a simple schedule (e.g. hourly) and not worry. Users could create extra copies whenever they wanted. Backup would be painless!

That was the promise of snapshots. Of course, they ran into the next problem.

Multiple Offsite Copies: “Scotty, Beam Us Up”

Backup is painful because you need to keep many offsite copies.

Companies expect their backup teams to:

  1. Store daily backups, so they can restore data from any day from the past months or years
  2. Restore the applications if something happens to the hardware, the data center, or the region.
  3. Spend less on the backup infrastructure (and team)

That’s why snapshots were never enough. Customers who lost their production system lost their snapshots. Replicating snapshots to a second array didn’t solve the problem, either…

At NetApp, a sales representative asked me to calm Bear Stearns. The director of IT complained that the backup solution (SnapVault to another NetApp system) cost more than the production environment. “You’re lucky that we don’t have to worry about money at Bear Stearns.” (Good times!) Then, he peppered me with questions about exotic failures— e.g. hash collisions, solar flares, and quantum bit flips. Our salesman had asked me to “distract him” from these phantasms, so I did. “I wouldn’t worry about those issues. We’re way more likely to corrupt data with a software bug. And that would corrupt your production and backup copies.” The blood drained from the customer’s face and he stopped asking questions (Mission accomplished!). As we left, the salesman snarled, “Next time, try to distract the customer by saying something good about our product.”

Companies store backups on alternate media (tape, dedupe disk, cloud) for reliability at a reasonable cost. That’s why backup software translates data into proprietary formats tuned for that media. The side effect is that only your backup software can read those copies. Result: Backup vendor lock-in!

Backup will be painful until we can solve the problems of performance and storing offsite copies. Imagine that you could make a resilient, secure offsite backup in an instant. You could make a simple schedule and recover from anything. Backup would be painless!

Until, of course, you met an application owner.

Silos: “Resistance is Futile”

Backup is painful because you have to connect the backup process to the application teams.

Companies expect their backup teams to:

  1. Work across all applications in the environment
  2. Respond quickly to application requests
  3. Spend less on the backup infrastructure (and team)

As difficult as technology is, connecting people is even more challenging. Application owners don’t trust what they can’t see or control.
One EMCWorld, I hosted a session for backup administrators and DBAs. At first, it was a productive discussion. One DBA explained, “If you can’t recover the database, it’s still my application that’s down. That scares me.” The group started brainstorming ways to give DBAs more visibility into the backups. Then a DBA blurted out, “I just can’t trust you guys with my database backups. You became backup admins because you weren’t smart enough to be DBAs. I’m going to keep making my own local database dumps.” After that, we decided try to solve the wrestling feud between Bret Hart and Shawn Michaels instead. It seemed more productive.

Companies need to manage complex backup schedules and create offsite copies. That’s why we have backup software. Backup software and schedules are so complex that companies hired backup teams to manage them. That extra layer is why business application owners don’t trust the backups.

Backup will be painful until application teams can trust and verify the backups of their applications.

Moving On? “I canna’ change the laws of physics”

Why is backup so painful?

It’s slow and expensive. It locks you into a backup vendor. It creates a backup silo that slows the business down. Other than that, backup is great.

Why have 25 years of innovative companies not eliminated the pain of backup?

Because we couldn’t change the laws of physics in the data center. Too much data. Too expensive to get data offsite. Too hard to connect backup teams and application teams.

Why am I optimistic for the future?

Because the cloud changes the laws of physics for backup. We can stop tweaking backup and finally fix it. We’ll save that mystery for next time.

Cloud Data Protection is Business Protection

“As I waded through a lake of rancid yogurt, each vile step fueled my rage over failed backups.” The server that ran a yogurt manufacturer’s automated packaging facility crashed. The IT team could recover some of the data, but not all. They hoped everything would “be OK”. When they restarted the production line, they learned that hope is not a plan. Machines sprayed yogurt like a 3 year old with a hose in a crowded church. By the time they shut down the line, they’d created a yogurt lake. It took two months to clean and re-certify the factory. They missed their quarterly earnings. People lost their jobs.
Data protection matters because data recovery matters. Even in the cloud. Especially in the cloud.

Businesses Run on Data

Digital transformation has turned every company into an application business.
Have you ever thought about the lightbulb business? Osram manufactured lightbulbs for almost a century. Then, LED bulbs decimated the lightbulb replacement business. Osram evolved into a lighting solution company. Osram applications optimize customers’ lighting for their houses, businesses, and stadiums. Now a high-tech company, they sold the traditional lightbulb manufacturing business in 2017.

How about the fruit business? Driscoll’s has grown berries for almost 150 years. In 2016, berries were the largest and fastest growing retail produce. Driscoll’s leads the market. They credit their “Driscoll’s Delight Platform”. It tracks and manages the berries from the first mile (growing) through the middle miles (shipping) to the last mile (retail consumer). Driscoll’s analyzes data at every stage to optimize the production and consumption of berries. Driscoll’s is a technology company that sells berries.

Every company is in the application business. Applications need data. To design lighting, Osram uses data about your house. To deliver the best berries, Driscoll’s analyzes data about the farms (e.g. soil, climate), shipping (e.g. temperature and route), and customer preferences.Modern businesses depend on applications. Applications depend on data. Therefore, modern businesses depend on data.

Data Protection: Because of Bad Things and Bad People

Every company protects their data center because there are so many ways to lose data.

CIOs have seen their companies suffer through catastrophes. Hurricane Harvey flooded Houston data centers. Hardware fails and sometimes catches fire. Software bugs corrupt data. People delete the presentation before the biggest meeting of their lives, so they throw a stone at a wasps’ nest to incite a swarm, get rushed to the hospital with dozens of vicious stings to have an excuse to re-schedule (or so I’ve heard).

IT organizations have also survived deliberate attacks. External hackers strike for fun and profit. Ransomware has become mainstream; cyber criminals can now subscribe to Ransomware as a Service! Now, anybody can become a hacker. Some attacks happen from inside, too. A terminated contractor at an Arizona bank destroyed racks of systems with a pickaxe. (I’ll never forget the dumbfounded CIO muttering, “We think he brought the pickaxe from home.” Because that’s what mattered.)After decades of enduring data loss, IT knows to protect the data center. Do we also need to protect data in the cloud?

Data Protection: Bad Things and Bad People Affect the Cloud

Every company needs to protect their data in the cloud because there are even more ways to lose it.

Bad things happen in the cloud. First, users still make mistakes. The cloud provider is not responsible for recovering from user error. Second, the cloud is still built of hardware and software that can fail. Vendors explain, “Amazon EBS volumes are designed for an annual failure rate (AFR) of between 0.1% — 0.2%, where failure refers to a complete or partial loss of the volume.” The applications you lose may be unimportant… or they may decimate your business. Third, since you are sharing resources, performance issues can affect data access. Amazon Prime Day is the most recent example. Finally, storms trigger data loss in a public cloud data center, just like they do in a corporate data center.

Public clouds are a bigger target for bad actors. Aggressive nations (with names that rhyme with Russia, Iran, North Korea, and China), bitcoin miners, and traditional criminals hack companies running in the cloud. Those hacks obliterate companies. Hackers deleted Code Space’s data in AWS. Two days later, the business shut down. Meanwhile, the scope of the public cloud makes internal threats more serious. The pickaxe (or virus)-wielding employee can now damage hundreds of companies instead of one!

Data is not any safer in the cloud than it is on-premises. Cloud providers try to protect your data, but it’s not enough. Even in the cloud, it’s your data. It’s your business. It’s your responsibility.

Protect the Cloud Data, Protect the Business

Modern businesses run on applications. Applications run on data. Most companies that lose data go out of business in 6 months or less.
Unfortunately, bad things and bad people destroy, steal, or disable access to the data. Whether you run on-premises or in the cloud, one day you will lose data. If you have a good backup and disaster recovery solution, you can recover the data. Your business can survive.

Amazon CTO Werner Vogels declared, “Everything fails all the time.” Companies need to protect their data in the cloud, so they can recover from those failures. Now, more than ever.