Why Urgency for Cloud Went to 11

What the Executives Aren’t Telling You

Congratulations! You own “the cloud platform” for your company. Maybe you applied for the role. Maybe you got volunteered. Most of you are just doing the job because somebody has to.

Regardless, your job is simple: lay tracks in front of a speeding freight train without getting flattened. (I said the job is simple, not easy.)

Why did the company put you in this position? Why are they asking you to move legacy workloads? And why are they pushing so hard now?

The #1 reason I hear from cloud practitioners is: “Because my Management said so.” If you want to be successful, that answer is not good enough. You need to know why the company wants to use public cloud, so you know how they’re measuring success… and you.

Your boss, talking about cloud. Courtesy: Bryan Valenza

Why Public Cloud?

Why are most companies adopting cloud?

Agility.

They aspire to move faster than their competitors. Executives imagine that first to the cloud will get the “multi-cloud, serverless, Kubernetes, microservices, automated, agile, synergistic, digital transformation, IT modernization orgasm of profit!”*

Buzzwords aside, there are real benefits to cloud. It helps companies develop, deploy, and scale applications. It shifts technology costs from large irregular capital expenses to predictable operational expense. Underneath the hype, cloud has value. That’s why it’s growing.

* NOTE: These are actual statements from actual CEO/CIO/CFOs.

The Executive Conference Room for “Orgasm of Profit” Courtesy: Disney

Why Move Old Workloads to Public Cloud?

If the business wants to move forward faster, why spend time on legacy applications?

Critical Mass.

Companies have legacy environments, private cloud, and public cloud. The legacy runs the business. Most IT professionals are experts in one legacy discipline — e.g. compute, storage, networking. Since people want to feel useful, they focus on their silo in the legacy environment. That’s why the public cloud never gets enough attention from IT. The only way to drive critical mass to the cloud is to force IT to move the legacy applications to the cloud. And if that saves the company capital expense on equipment and data centers, bonuses for everyone!*

* NOTE: “Everyone” being only those with access to the conference room dedicated to the “orgasm of profit”.

The business pressure to move to cloud now is real. Courtesy: South Park

Why are Companies Moving NOW?

Why is management putting so much stress on moving to cloud now?

They’re not. It just feels that way. You moved the EASY workloads to the cloud. Moving the next workloads will be HARD. But the schedule is the same. That’s stressful.*

Executives have been pushing for agility and savings via cloud for years. First, companies adopted SaaS for basic functions. Second, they moved test and development to cloud. Third, they stored cold data in the cloud.

Now that you’ve done the “easy” work, it’s time for the hard job — moving real applications. Real applications keep persistent customer data in databases and files. Real applications are complex. Real applications need availability, security, data protection, and predictable performance. Real applications run the business. (Don’t panic, though. There are many real applications to move before getting to SAP and Oracle.)

Executives are hooked on cloud wins. Those wins “prove” that they’re innovating and beating the competition. The savings feel good, too. At each hardware refresh cycle, moving to the cloud cuts capital expenses. The savings from each cloud step funds the next one. It doesn’t matter that each step gets more difficult. Everything depends on the next hit of capital savings. That’s why executives need you to deliver the next step… now.

* NOTE: I took a class taught by Turing Award winner Michael Rabin. He spent half of each lecture covering simple arithmetic. At the end, he raced through complex math proofs. We asked why he spent so much time on the simple math vs. the hard math. His answer: “It’s all simple to me.” That’s how executives think about cloud. It’s all simple to them.

Most executives thought Spinal Tap was a documentary. Courtesy: knowyourmeme.com

Conclusion

Businesses need to move to the cloud to compete. It’s not enough to just build some cloud-native applications. They need critical mass on the cloud. That’s why they’re asking IT to migrate legacy workloads.

IT feels tremendous pressure from the business because the next cloud migrations will be hard. There are no more easy wins. You’ve done SaaS, test and development, and archive. Now, it’s time to move business applications. They’re complicated. They have data. They run the business. And they need to be moved now.

Congratulations on owning the cloud platform! Keep running, the train is always coming.

How to Begin Your Cloud Career

Codeword: Agile

The #1 question people used to ask: “How can I get management to buy into my idea?”

Now it’s: “How can I get management to buy into my idea about cloud?”

Then they talk about their attempts to sway their bosses. I’m not surprised they’re not succeeding. I’m surprised that they haven’t been fired.

Don’t jump in front of a runaway cloud train. Courtesy: Thomas the Tank Engine

What Not To Do

If you’re about to use any of these approaches, stop yourself. Even if you have tap into your inner Tyler Durden and knock yourself out.

 

Here’s Why It Won’t Work!!

You’ve seen the cloud plan. Your company has been playing with the cloud — test and development and some cloud-native toy applications. It’s gone well. Now they’re planning to run applications with data (aka — real applications).

Now is your moment! You warn everybody that there’s a looming disaster. There’s no plan for handling the storage failures (0.1% of devices) … or backup … or security. There’s no strategy to avoid vendor lock-in. And they sideline you. What?!

Lesson: Everybody has bought in, and you can’t stop the train. Nobody wants to hear why the train will derail. Instead of seeming wise, you sound like you’re protecting your job.

It’s Going to Be Too Expensive!!

This is a favorite criticism of the cloud. Especially from legacy IT vendors. The argument goes:

  • A well-run IT department can deliver the same services at a cheaper price.
  • You’re paying for flexibility in the cloud, so it must be more expensive.

Despite this wisdom, the business units ignore you.

Lesson: Cloud isn’t about cutting costs. Businesses are frustrated with IT’s lack of agility, and cloud lets them move faster. Since you’ve just aligned yourself with IT, you’re now “part of the problem”.

This is how the business thinks of IT. Courtesy: theodyssesyonline.com

If You Give Me 6 People and 6 months, I Can “Do It Right”

Businesses are already “swiping a credit card” and running in the cloud. You asked for a team of people and time to come up with a plan. That sounds like you’re using a legacy approach to design a new environment. They hear warning bells, and find somebody who will do it faster with fewer people.

Lesson: Executives like cloud because there’s no lead time. If you’re going to appeal to them, you can’t talk in quarters or even months. Think weeks.

It’s your boss when you bring up new tech. We both know it. Courtesy: imgflip.com

Let me Try this New Technology!

You know Docker, Kubernetes, and/or MongoDB would help the company develop applications faster. Somehow. You extol the virtues of Docker Overlay Networks, Kubernetes Stateful Sets, and eventual consistency NoSQL databases. Unfortunately, your boss refuses to commit and asks you to write up a report. You know nothing is going to happen.

Lesson: Your managers do not have grounding in the new technology, so they feel insecure. They were probably last “hands-on” with VMs. They’re not going to risk their necks for something they don’t understand.

Summary

Don’t be negative. Don’t be slow. Don’t make your boss feel stupid.

(Before you laugh, be honest. How many times have you broken these rules?)

Be Agile. Agile is Awesome! Courtesy: IDG Connect

What To Do

Be Agile. Agile is the term of the day. Executives, businesses, and managers love the word and what it symbolizes. Everybody wants to move faster and cheaper. Everybody wants to “Be Agile.”

To change your approach, follow this formula:

  1. Explain your business value (bonus points if you use the word agile!)
  2. Bring solutions to the problems

A More Resilient Cloud Makes the Business More Agile

Business Value: A more resilient cloud environment makes us more agile. With a resilient cloud, we can lift-and-shift existing applications. Without it, we need to re-architect everything to be cloud-native. That will be slow and expensive.

Problem: AWS has 0.1% Storage Failure Rate for Block Storage.

Solution: We should mirror the block devices. We should make backups on the resilient object storage in multiple clouds.

Don’t worry, business units will learn to love best practices. Courtesy: pastoralmeanderings.blogspot.com

Centralized Cloud Best Practices Makes the Business More Agile

Business Value: Central management of the cloud makes us more agile. Business units won’t have to figure out what cloud configuration works best with trial and error. We’ll do that work, so they can focus on building revenue-generating applications.

Problem: Each business unit is buying their own cloud resources. There are billions of combinations. They don’t have the time to figure out what works best. They’re picking something and hoping it’s reasonable.

Solution: A small central team can work on best practices. We can even A-B test across groups to find out what works best.

Give your developers the cloud EASY button. Courtesy: concertocloud.com

A Simpler Cloud Makes Developers More Agile

Business Value: We can use cloud for more applications, if we give the application teams a more mature environment. Otherwise, they need to learn to build microservices before they’re productive.

Problem: The cloud lacks data management: availability, performance management, and data protection. The application teams have to build data management into their apps. The extra work slows them down.

Solution: We will build cloud data management, so more application developers can be productive.

You can help the lightbulb go on for your boss. Courtesy: Disney

New Technology Can Help Us Be More Agile With Cloud Providers

Business Value: Running in multiple clouds gives us leverage against any one vendor. We can run different applications in different clouds.

Problem: It’s a big learning curve to run in different clouds.

Solution: Technologies like Kubernetes and Docker can help virtualize the cloud. It does for public cloud what VMware did for servers. Let me just walk you through how it might work… (Now you have your chance to educate them!)

Conclusion

“How can I get management to buy into my idea about cloud?” is the right question. Cloud is the future.

You just need to know how to approach management. Don’t be “Dr. No” or “Dr. Slow”. That’s what they don’t like about IT. They’ve fought for cloud and they want people who will fight for them and their success.

Give them:

  • Agile Business Value
  • Problem
  • Solution

And if you think you’re saying “Agile” too often… you’re not. Don’t roll your eyes. Agility is the rare buzzword that actually delivers value to the business.

Agile is Awesome.

Cloud Didn’t Kill Storage

But it did just lose your data

At a big IT event, I asked, “If you care about storage management, raise your hand.” I saw a smattering of hands.

Next, “Put your hand down if you aren’t a storage admin.”

One hand stayed in the air.

I asked the lone holdout, “Why do you care about storage management?”

“What? No. I need a raffle ticket. There’s a raffle at the end of this talk, right?”

Storage management is the least appreciated part of IT infrastructure… which is saying something. Business users understand server and network issues. Security and backup teams overwhelm them with graphic horror stories. The storage team just gets a budget cut and requirements to store twice as much data.

An average day for a Storage Administrator. Photo Courtesy: Expedia Norway

What is Storage Management?

When you buy storage, you have to consider (at least) five factors:

  1. Capacity — How much data do I want to store?
  2. Durability — How much do I not want to lose my data?
  3. Availability — How long can I go without being able to read or write my data?
  4. Performance — How fast do I need to get at my data?
  5. Cost — How much am I willing to pay?

Different products optimize for different factors. That’s why you see hundreds of storage products on the market. It’s why you see individual vendors sell dozens of storage products. There is no one-size-fits-all product.

Storage management is meeting the storage needs of all the different business applications.

Storage management is also knowing that you’ll always fall short.

If only the cloud storage choices were so obvious... Courtesy: Imgur.com:pleple28

Doesn’t Cloud Make Storage Management Go Away?

No.

The same five factors still matter to your applications, even in the cloud.

That’s why cloud providers offer different types of storage. AWS alone offers:

  1. Local Storage (3 types)– Hard Drive, Flash, NVMe Flash
  2. Block Storage (4 types)– IOPS Flash (io1), General Purpose Flash (gp2), Throughput Optimized Hard Drive (st1), and Cold Hard Drive (sc1)
  3. Object Storage (4 types)– Standard S3, Standard Infrequently Accessed S3, One-Zone Infrequently Accessed S3, Glacier

That’s 11 types of storage for one cloud. Has your head started to spin?

It gets worse. You face all the old storage management challenges plus some new ones.

Wait, Cloud Makes Storage Management MORE Important?

Yes.

Cost Overruns — Overprovisioning

It’s easy for application teams to run up a massive cloud storage bill.

On-premises environments built up checks-and-balances. The application team asks the storage management team for storage resources. The storage team makes them justify the request. The storage management team needs to buy more hardware. Purchasing makes them justify the request. The process slows down the application team, but it prevents reckless consumption and business surprises.

Cloud environments wipe away the checks-and-balances. Application owners pick a type of cloud storage for their application. The cloud provider tells them how much performance they get for each GB of capacity. They see that it costs a dime or less a month. They don’t have to ask anyone for approval. So they buy a big pool of storage. Why not? Turn on the faucet. It’s cheap.

Then they need more capacity. So they buy more. Turn on the faucet again. And it’s cheap.

Courtesy: Raphael Cushnir

Then they need more performance. So they buy more. Turn on the faucet again and again and again. And it’s still cheap.

The bill for 100s of TBs of storage comes in at the end of the month. Nobody turned off the faucet. They’ve run up the water bill and flooded the house.

It’s not cheap anymore.

Cost Overruns — Storage Silos

It’s easy for application teams to waste money in the cloud.

Twenty year ago, each server had its own storage. Server A used 1% of its storage. Server B used 100% of its storage and ran out. Server B couldn’t use Server A’s storage. Then storage teams adopted shared storage, SAN or NAS. Now they can give storage resources to any application or system that needs it. Shared storage eliminates the waste from the islands of server storage.

Today, cloud environments don’t share storage.

When I create a cloud instance (aka server), I buy storage for it.

When I create a second cloud instance, I buy storage for it.

When I create a third cloud instance, I buy storage for it!

When I … you get the idea.

We’ve re-created the “island of storage” problem. Except at cloud scale, application teams end up with island chains of storage. Even Larry Ellison doesn’t have archipelago money.

Data Loss — Trusting the Wrong Type of Storage

It’s easy for application teams to lose their data in the cloud.

On-premises environments use resilient shared storage systems. Application owners don’t think about RAID, mirroring, checksums, and data consistency tools. That’s because the storage management team does.

In the cloud, picking storage is like a “Choose Your Own Adventure” story, where you always lose:

  • You choose local storage. When the node goes away, so does your storage. One day you shut down the node to save money. You lost all your data. Start over.
  • You choose block storage. AWS states, “Amazon EBS volumes are designed for an annual failure rate (AFR) of between 0.1% — 0.2%, where failure refers to a complete or partial loss of the volume.” You’ll lose a volume – i.e. an application’s data. Start over.
  • You choose S3 object storage. You chose resilient storage! You win! **

** Oops. Your app can’t run with slow performance. You lost your customers. Start over.

NOTE: Backups can help. But you should have resilient production storage and backups. Backups should be your last resort, not your first option.

It's not too late to manage storage for containers in the cloud. Courtesy: tee turtle.com

When Should I Plan for Cloud Storage Management?

Now.

It’s time to start thinking about storage management in the cloud.

Today, you’re running cloud applications that don’t need to store data. Or that don’t care if they lose data. Or you’ve just been lucky.

The time to manage cloud storage is coming. You need to start planning now. It can save your applications, your business and your career.

Oh, and don’t spend time worrying about the raffle. They’re always rigged for the customer with the biggest deal pending, anyway.

Why Can’t You Find a Good IT Job?

The Risks for the Five Types of IT Infrastructure Jobs

It hurts to hunt for a job in IT infrastructure right now. Every rejection finds new ways to embarrass and frustrate you. Even the offers carry painful tradeoffs. Cloud has changed the job options for infrastructure engineers. There are no perfect jobs, but there are opportunities.

I’ve seen five types of companies hiring infrastructure engineers. Each has rewards and risks.

Legacy Whale Infrastructure Companies haven't been doing so well. Photo Credit: outoftheboxscience.com

Legacy Whale — On-Premises Tech Giants

Legacy infrastructure companies put profit over growth — including yours. Their market may be shrinking, but they still run enterprises’ most important applications. The last company standing will charge a premium for their technology. That’s why the legacy giants need engineers to deliver products for their core markets.

The positives:

  • Salary. They pay good salaries from their profit margins.
  • Enterprise Experience. You learn how to work with a mature product for enterprise customers.

The risks:

  • Layoffs. Profit comes when you earn more than you spend. Products that need incremental development don’t need expensive engineers.
  • Stagnation. You’re working on the same product for the same customers. Everything is incremental. You’re missing sweeping technical and business trends.
  • Left too Late. If you stay too long, interviewers will wonder why. Were you too lazy to move? Too comfortable? Nobody wanted you?

The legacy whales can be a lucrative home, and they teach you how to work with big customers. You just have to ask, “When is the right time to jump ship?”

On-premises startups need to move faster than even Chuck Norris. Photo credit: imgflip.com

Legacy Piranha — On-Premises Startups

Legacy piranha companies have to grow fast. The legacy market may be shrinking, but it’s still huge. The legacy whales can’t always move fast enough to block small companies (either with technology or sales). Some piranhas can eat enough of the whales to IPO or get bought.

The positives:

  • System View.You design products from scratch, so you can see new parts of the system
  • Customer Experience.In a smaller company, you can work directly with customers.
  • Financial Upside.If the company takes off, so does your equity.

The risks:

  • Limited growth. Piranhas need you to do what you’ve done before. The race is on, and they can’t afford to train you on something else.
  • No market.In a shrinking market, everything has to be perfect. The product. The go-to-market. And you need the whale to miss you. For every Pure or Rubrik, there are a dozen Tintri and Primary Data.

The legacy piranhas can be an exciting gamble. You can see the whole system and work with hands-on customers. You just have to ask, “What happens if this fails?”

The Big 3 Cloud Providers are hungry for more. Photo Credit: killer-whale,org

Killer Whales — The Big 3 in Public Cloud

The killer whales (AWS, Azure, Google Cloud) control the new ocean of IT infrastructure. They’re taking share in the growing market of public cloud. The customers, requirements, and technology are different from the legacy environment. Their scale dwarfs even the largest enterprises. The problems are the same, but the rules are different.

The positives:

  • New Technology.Killer whales mix commodity technology with bleeding edge. They must innovate to stay ahead.
  • New Perspective. The scale is orders of magnitude greater than what we’re used to. The integration of the stack eliminates our silo’ed view.
  • Growth.The killer whales can afford to pay and give new opportunities.

The risks:

  • Getting Hired. They have their pick of new hires. They may see your experience as a limitation, since they want to build things in a new way.
  • Succeeding. The environment is different. The way you did things won’t work. They’re moving fast. You’re going to be very uncomfortable.
  • Limited Customer Interaction. At their scale, it’s difficult to get direct customer interaction. You’re one of the masses building for the masses.

The killer whales will be an exciting ride that sets you up for the future. You just have to ask, “Am I ready?”

Inside the belly of a while, you can go places. But you're still stuck inside a whale. Photo Credit: Pinocchio..

Inside the Blue Whales — Joining IT

Some of the biggest companies in the world build their own IT infrastructure. They create some of the most interesting infrastructure innovation (e.g. Yahoo, Google, Facebook, Medtronic, Tesla). Nothing makes infrastructure requirements more real than building an application on top of it.

The positives:

  • New Technology.You’re building custom technology because vendors’ products don’t work for them.
  • New Perspective. The scale and integration with business applications changes how you view infrastructure.
  • Growth.You could move from infrastructure to the building the application.

The risks:

  • Getting Hired and Making a Difference. See “Killer Whales”.
  • You’re a Cost Center. When you build the product, you are the business. When you provide services for the product, you’re a cost center. At Morgan Stanley, an IT member advised me, “Don’t work here. We’re the most innovative technical company on Wall Street, but we’re still the help. The traders are the business. Never be the help.”

The Blue Whales are technology users that push the boundaries of infrastructure. You just have to ask, “Am I comfortable being a cost center?”

Building services on top of public cloud is exhilarating and terrifying. Photo Credit: Sarah Kim

Riding the Killer Whales — Building on the Public Cloud

The Killer Whales can’t do everything well. No matter how quickly they hire, they can’t build decades of functionality in a few years. Furthermore, nobody wants to lock into one Killer Whale. They know how that story ends. That’s why companies are adding multi-cloud infrastructure services on top of the public cloud.

The positives:

  • New Technology.You’re riding the new technology, trying to tame it.
  • New Perspective. You learn how companies are trying to use public cloud and what challenges they face. You can see how they’re evolving from legacy to public cloud.
  • Upside. If the company takes off, so do you. You’re the expert in a new market area. Oh, and the financial equity will be rewarding, too.

The risks:

  • No Market. You have the traditional startup concerns (funding, customers, competitors) and more. You worry that the killer whales will add your functionality as a free service. You worry that the killer whales will break your product with their newest APIs. Riding killer whales is scary!
  • Financial Downside.Low salary. Even lower job security.

Some Killer Whale Riders will become the next great technology infrastructure companies. You just have to ask, “How much risk am I comfortable with?”

It only felt like Moses led me to Nuvoloso. Photo Credit: The Ten Commandments

Conclusion

A decade ago, even incompetent IT infrastructure vendors could grow 10% a year because the market was so strong. No more. Today, there are no infrastructure jobs without risk. Of course, there are still great opportunities.

I’m riding the killer whales because I’d gotten disconnected from new technology and new customer challenges. The risk is terrifying, but I’ve never been happier. The choice was right for me.

What did you choose and why?

Backup Sucks, Why Can’t We Move On?

Solving the Mystery of “Why is backup so painful?”

“Tape Sucks, Move On” (Data Domain)

“Don’t Backup. Go Forward.” (Rubrik)

“Don’t even mention backup in our slogan” (Every other company)

Everybody hates backup — executives, users, and administrators. Even backup companies hate it (at least their slogan writers do). Organizations run backup only because they have to protect the business. I’ve met hundreds of frustrated backup customers who have tried snapshots, backup appliances, cloud, backup as a service, and scores of other “fixes”. They all ask one question –

“Why is backup so painful?!?”

 

Performance: “I’m Givin’ Her All She’s Got, Captain!”

No matter how hard you run, backup isn’t fast enough. Photo Credit: Mission Impossible 4

Backup is painful because it is slow and there is so much data.

Companies expect the backup team to:

  1. Back up PBs of data for thousands of applications every day
  2. Not affect application performance (compute, network, and storage)
  3. Spend less on the backup infrastructure (and team)
  4. Rinse and Repeat next year with twice as much data

Everybody underestimates the cost of backups. While at EMC, a federal agency (no way I’m naming this one) complained about their backup performance. In their words, “The data trickles like an old man’s piss.” They were using less than 1% of the Data Domain’s performance. Their production environment, however, was running harder than Tom Cruise (and just as slow). When they set up their application environment, they hadn’t thought about backup. To meet their application and backup SLAs, they had to buy 4x the equipment and run backups 24 hours a day. NOTE: Unless you can pay for IT gear with tax dollars, I would not depend on that approach.

Backups run for a long time and they use a lot of resources. Teams have to balance application performance with backup SLAs across vast oceans of data. It’s an impossible balancing act. That’s why backup schedules are so complex.

Backup will be painful until we solve the performance problem. Imagine that you could make backup in an instant. You could make a simple schedule (e.g. hourly) and not worry. Users could create extra copies whenever they wanted. Backup would be painless!

That was the promise of snapshots. Of course, they ran into the next problem.

Offsite backups are painful. Image Credit: www.glasbergen.com

Multiple Offsite Copies: “Scotty, Beam Us Up”

Backup is painful because you need to keep many offsite copies.

Companies expect their backup teams to:

  1. Store daily backups, so they can restore data from any day from the past months or years
  2. Restore the applications if something happens to the hardware, the data center, or the region.
  3. Spend less on the backup infrastructure (and team)

That’s why snapshots were never enough. Customers who lost their production system lost their snapshots. Replicating snapshots to a second array didn’t solve the problem, either…

At NetApp, a sales representative asked me to calm Bear Stearns. The director of IT complained that the backup solution (SnapVault to another NetApp system) cost more than the production environment. “You’re lucky that we don’t have to worry about money at Bear Stearns.” (Good times!) Then, he peppered me with questions about exotic failures— e.g. hash collisions, solar flares, and quantum bit flips. Our salesman had asked me to “distract him” from these phantasms, so I did. “I wouldn’t worry about those issues. We’re way more likely to corrupt data with a software bug. And that would corrupt your production and backup copies.” The blood drained from the customer’s face and he stopped asking questions (Mission accomplished!). As we left, the salesman snarled, “Next time, try to distract the customer by saying something good about our product.”

Companies store backups on alternate media (tape, dedupe disk, cloud) for reliability at a reasonable cost. That’s why backup software translates data into proprietary formats tuned for that media. The side effect is that only your backup software can read those copies. Result: Backup vendor lock-in!

Backup will be painful until we can solve the problems of performance and storing offsite copies. Imagine that you could make a resilient, secure offsite backup in an instant. You could make a simple schedule and recover from anything. Backup would be painless!

Until, of course, you met an application owner.

Nobody trusts the backup silo. Image Source: focusu.com

Silos: “Resistance is Futile"

Backup is painful because you have to connect the backup process to the application teams.

Companies expect their backup teams to:

  1. Work across all applications in the environment
  2. Respond quickly to application requests
  3. Spend less on the backup infrastructure (and team)

As difficult as technology is, connecting people is even more challenging. Application owners don’t trust what they can’t see or control.

DBA vs. Backup Admin

One EMCWorld, I hosted a session for backup administrators and DBAs. At first, it was a productive discussion. One DBA explained, “If you can’t recover the database, it’s still my application that’s down. That scares me.” The group started brainstorming ways to give DBAs more visibility into the backups. Then a DBA blurted out, “I just can’t trust you guys with my database backups. You became backup admins because you weren’t smart enough to be DBAs. I’m going to keep making my own local database dumps.” After that, we decided try to solve the wrestling feud between Bret Hart and Shawn Michaels instead. It seemed more productive.

Companies need to manage complex backup schedules and create offsite copies. That’s why we have backup software. Backup software and schedules are so complex that companies hired backup teams to manage them. That extra layer is why business application owners don’t trust the backups.

Backup will be painful until application teams can trust and verify the backups of their applications.

Cloud is going to change things for backup. Image Source: tallbloke.wordpress.com

Moving On? “I canna’ change the laws of physics”

Why is backup so painful?

It’s slow and expensive. It locks you into a backup vendor. It creates a backup silo that slows the business down. Other than that, backup is great.

Why have 25 years of innovative companies not eliminated the pain of backup?

Because we couldn’t change the laws of physics in the data center. Too much data. Too expensive to get data offsite. Too hard to connect backup teams and application teams.

Why am I optimistic for the future?

Because the cloud changes the laws of physics for backup. We can stop tweaking backup and finally fix it. We’ll save that mystery for next time.

Merry Misadventures in the Public Cloud

Seven Costly Cloud Catastrophes in Seven Days

My first Amazon Web Services (AWS) bill shocked and embarrassed me. I feared I was the founding member of the “Are you &#%& serious, that’s my cloud bill?” club. I wasn’t. If you’ve recently joined, don’t worry. It’s growing every day.

The cloud preyed on my worst IT habits. I act without thinking. I overestimate the importance of my work (aka rampaging ego). I don’t clean up after myself. (Editor’s note: These bad habits extend beyond IT). The cloud turned those bad habits into zombie systems driving my bill to horrific levels.

When I joined Nuvoloso, I wanted to prove myself to the team. I volunteered to benchmark cloud storage products. All I needed to do was learn how to use AWS, Kubernetes, and Docker, so I could then install and test products I’d never heard of. I promised results in seven days. It’s amazing how much damage you can do in a week.

 

Sometimes too much is too much. Photo Credit: Danny Sullivan

Overprovisioning - Acting without Thinking

I overprovisioned my environment by 100x. The self-imposed urgency gave me an excuse to take shortcuts. Since I believed my on-premises storage expertise would apply to cloud, I ran full speed into my first two mistakes.

Mistake 1:Overprovisioned node type.

AWS has dozens of compute node configurations. Who has time to read all those specs? I was benchmarking storage, so I launched 5 “Storage Optimized” instances. Oops. They’re called “Storage Optimized” nodes because they offer better local storage performance. The cloud storage products don’t use local storage. I paid a 50% premium because I only read the label.

Mistake 2: Overprovisioned storage.

You buy on-premises storage in 10s or 100s of TB, so that’s how I bought cloud storage. I set a 4 TB quota of GP2 (AWS’ flash storage) for each of the 5 nodes — 20TB in total. The storage products, which had been built for on-premises environments, allocated all the storage. In fact, they doubled the allocation to do mirroring. In less than 5 minutes, I was paying for 40TB. It gets worse. The benchmark only used 40GB of data. I had so much capacity that the benchmark didn’t measure the performance of the products. I paid a 1000x premium for worthless results!

Eventually, you have to clean up the mess. Photo Credit: Reuters

Just Allocate A New Cluster - Ego

I allocated 4x as many Kubernetes clusters as I needed.

When you’re trying new products, you make mistakes. With on-premises systems, you have to fix the problem to make progress. You can’t ignore your burning tire fire and reserve new lab systems. If you try, your co-workers will freeze your car keys in mayonnaise (or worse).

The cloud eliminates resource constraints and peer pressure. You can always get more systems!

Mistakes 3 & 4: I’ll Debug that Later” / “Don’t Touch it, You’ll Break It!”

Day 1:Tuesday. I made mistakes setting up a 5-node Kubernetes cluster. I told myself I’d debug the issue later.

Day 2: Wednesday. I made mistakes installing a storage product on a new Kubernetes cluster. I told myself I’d debug the issue later.

Day 3: Thursday. I made mistakes installing the benchmark on yet another Kubernetes cluster running the storage. I told myself that I’d debug the issue later.

Day 4: Friday. Everything worked on the 4th cluster, and I ran my tests. I told myself that I was awesome.

Days 5 & 6 — Weekend. I told myself that I shouldn’t touch the running cluster because it took so long to setup. Somebody might want me to do something with it on Monday. Oh, and I’d debug the issues I’d hit later.

Day 7 — Monday. I saw my bill. I told myself that I’d better clean up NOW.

In one week, I had created 4 mega-clusters that generated worthless benchmark results and no debug information.

"Terminate Instance" - I do not think it means what you think it means. Photo Credit: Princess Bride

Clicking Delete Doesn't Mean It's Gone - Cleaning up after Myself

After cleaning up, I still paid for 40TB of storage for a week and 1 cluster for a month.

The maxim, “Nothing is ever deleted on the Internet” applies to the cloud. It’s easy to leave remnants behind, and those remnants can cost you.

Mistake 5: Cleaning up a Kubernetes cluster via the AWS GUI.

My horror story began when I terminated all my instances from the AWS console. As I was logging out, AWS spawned new instances to replace the old ones! I shut those down. More new ones came back. I deleted a subset of nodes. They came back. I spent two hours screaming silently, “Why won’t you die?!?!” Then I realized that the nodes kept spawning because that’s what Kubernetes does. It keeps your applications running, even when nodes fail. A search showed that deleting the AWS Auto Scaling Group would end my nightmare. (Today, I use kops to create and delete Kubernetes clusters).

Mistake 6: Deleting Instances does not always delete storage

After deleting the clusters, I looked for any excuse not to log into the cloud. When you work at a cloud company, you can’t hide out for long. A week later, I logged into the AWS for more punishment. I saw that I still had lots of storage (aka volumes). Deleting the instances hadn’t deleted the storage! The storage products I’d tested did not select the AWS option to delete the volume when terminating the node. I needed to delete the volumes myself.

Mistake 7: Clean Up Each Region

I created my first cluster in Northern Virginia. I’ve always liked that area. When I found out that AWS charges more for Northern Virginia, I made my next 3 clusters in Oregon. The AWS console splits the view by region. You guessed it. While freaking out about undead clusters, I forgot to delete the cluster in Northern Virginia! When the next month’s sky-high bill arrived, I corrected my final mistake (of that first week).

Welcome to the Family

Cloud can feel imaginary until that first bill hits you. Then things get real, solid, and painful. When that happens, welcome to the family of cloud experts! Cloud changes how we consume, deploy, and run IT. We’re going to make mistakes (hopefully not 7 catastrophic mistakes in one week), but we’ll learn together. I’m glad to be part of the cloud family. I don’t want to face those undead clusters alone. Bring your boomstick.

Traditional Applications Run Better in Public Cloud

Public Cloud Can Benefit Traditional Applications

Public cloud works. Not just for SaaS, cloud native applications, or test and development. Not just for startups or executives bragging to each other on the golf course. Public cloud works for traditional, stable applications. It can deliver better service levels and reduce costs … even compared to a well-run on-premises environment.

To date, market analysts have focused on cloud disrupting who buys IT infrastructure. Frustrated lines of business pounced on the chance to bypass IT. Cloud let them “Fail Fast or Scale Fast”. They didn’t have to wait for IT approval, change control, hardware acquisition, or governance. Lines of business continue to embrace cloud’s self-service provisioning at a low monthly cost.

 

Do you need a custom-built environment for all your applications?

Can Public Cloud Cost Less than a Well-Run Private Cloud? Yes.

Conventional wisdom says public cloud can’t compete with a well-run on-premises environment. IT architects argue that public cloud can’t match the performance and functionality of legacy environments. IT Administrators can’t tweak low level knobs. IT Directors can’t demand custom releases. How can vanilla cloud handle the complex requirements of legacy applications? Financial analysts note that public cloud charges a premium for its flexible consumption. Stable workloads don’t need that flexibility, so why pay the premium?

Conventional wisdom is wrong. Most traditional workloads don’t need custom-built environments. You don’t need a Formula-1 race car to pick up groceries, and you don’t need specially-made infrastructure to run most applications. Moreover, public cloud’s architectural advantages can reduce IT costs, even with the pricing premium.

Public cloud has two architectural advantages for traditional applications:

  1. More price/performance options
  2. On-demand provisioning for data protection.
One size does not fit all for application infrastructure

Public Cloud = Choice

Public cloud offers more price/performance choices than on-premises infrastructure. Outside of the Fortune 50, most companies don’t get to buy “one of everything” for their infrastructure. Instead, they buy a one-size-fits-all workhorse system to support all the workloads. The public cloud offers more technology choices than even the largest IT shop. It is the biggest marketplace (pun intended) for different technology configurations. Cloud levels the playing field between smaller and bigger companies.*

* NOTE: For this to happen, we need to solve the operational challenges of running different cloud configurations.

Cloud will change how you think about data protection

Public Cloud = Better Data Protection

Public cloud can improve data protection. For years, IT has struggled to deliver high-performance disaster recovery, backup, and archive. Companies can’t afford to run DR and archive environments for all their applications; maintaining two near-identical sites costs too much. That’s why they pretend that their backups can be DR and archive copies. Unfortunately, when disasters or (even worse) legal issues strike, recovery cannot begin until IT provisions a new environment. Companies collapse before recoveries can complete.

Public cloud’s on-demand provisioning enables cost-effective first-class DR, archive, and backup. Customers don’t waste money on idle standby environments. Nor do they treat “hope that nothing goes wrong” as a strategy. Instead, when necessary, they near-instantly spin up compute and storage in a new location. Then, they near-instantly restore the data and start running.* With public cloud, IT can unify enterprise-class DR, backup, and archive. Organizations are already moving backup copies to cloud object storage. The next step will be to use those copies for unified data protection.

*NOTE: For this to happen, we must create cost-effective cloud protection storage and build near-instant data recovery mechanisms.

Conclusion

Public cloud works for traditional applications. You can run applications on the best configuration, rather than what is available. You can have first-class DR and archive, rather than “best effort” with backup copies. You can replace your hand-crafted environments with something less expensive and more functional. Public cloud should not threaten IT; instead its architecture should help IT to deliver better services. It’s time to stop resisting and start building.