How Cloud Apps Go Wrong

Data Can Bring You Down

Welcome to the Cloud

It’s exciting! You’ve got a huge new toolbox with a cornucopia of choices. And forums with a bonanza of advice.

Whether you are creating, migrating, or something in-between, everybody has an opinion about building cloud apps:

  • Development Model: Serverless? Micro-services? Containerized? Traditional?
  • Platform: Cloud-Provider? PaaS? IaaS packages?
  • Database Stores: MongoDB? PostgreSQL? DynamoDB? S3?

While you dig through the avalanche of advice, there is one topic conspicuous by its absence… what is the right way to handle persistent data?

Data storage in the cloud is clean and easy? Courtesy: https://www.flickr.com/photos/jonwestra78

Day 1: How Should I Manage My Application's Data?

There are even more ways to store data in cloud than there were on-premises!

You can use:

  • Database as a Service: AWS RDS, MongoDB Atlas, etc.
  • Standalone Databases: PostgreSQL, MongoDB, MySQL, etc.
  • Files: local files or cloud NAS
  • Objects: more tiers than an Instagram wedding cake

Most applications use a combination of tools to get the best results.

Once you decide how to store the data, you need to provision for capacity and performance. You might even consider how you’ll scale, deal with failures, secure the data, and meet compliance regulations (Hey, I’m an optimist).

It’s a stressful decision. Data is “the new oil”, “the lifeblood of business”, and “the new bacon”. You want to pick the right way to manage your data; your business depends on that information. Unfortunately, with so many variables, making the right decision seems impossible. Still, you’ll pick something reasonable, and then you’ll never have to worry about it again. Phew.

Except… data doesn’t take care of itself. In the cloud, there is no storage, backup, disaster recovery, or compliance team watching over your data. If you care about your application, you’ll be managing its data every day of the application’s life.

Even the Gingerbread man can’t escape data management. Courtesy: https://society6.com/product/attack-of-the-gingerbread-man-ii_wall-clock

Day 2: A Day in the Life of a Cloud App Developer

Congratulations! You’ve released an application into your environment. You’ve created something that helps people. Take a moment to celebrate.

Now, take a deep breath. Your life will never be the same. Here’s a day in your life.

06:30: Make sure backups are happening and that they’re being replicated off site. Consider testing a restore to make sure the backups are good. Resolve to do it tomorrow… for the 100th straight day. Push down the guilt.

10:30: You get an escalation about application performance. After looking at processing and network, check out the data path. Figure out which storage resources are serving the application. Look at historical data to find changes in workload, resource performance, etc. Hope that application performance goes back to normal on its own.

12:10: Discuss with others how to scale the application. If you add more compute resources, how can they share the same data? Will that become a bottleneck?

14:07: Somebody hits a bug in the application. Create a clone environment, so you can reproduce the bug and test and fix. Wait hours to clone or create a data set.

16:20: Finance has asked you to reduce the cost of running your application. Explore ways to either use less expensive storage or use it more efficiently/dynamically.

18:30: As you walk out of the building, wonder when you will have time to ever build a new application.

23:30: Security has identified a flaw in the configuration of some AWS S3 buckets. They want to make sure your data is secure and encrypted. You need to send them details of what you’re doing and how. They also demand full audit logs of who has access to the data and what they’ve done.

06:30: Do it all over again…

The cloud offers infinite tools for building applications: UI, queuing, analytics, etc. Without data management, however, you won’t have time to create new applications. Instead, you’ll become a de-facto platform administrator: managing storage, backup, and compliance for your existing apps.

There is another way. Credit: Unknown

Day 2, Take 2: A Better Day

It may feel like you only have two options:

  • Bad: Spend every day in data management hell.
  • Worse: Use only one type of storage — e.g. object, DBaaS, etc.

There is a third answer: a data management solution that does the job of managing storage, data protection, and compliance for you.

In this world, today is not the worst day of your life…

06:30: Breakfast. [Backups have been happening automatically every 4 hours.]

10:30: Develop your new application. [Storage resources automatically scaled up to meet your existing application’s performance needs.]

12:10: Eat lunch and talk with co-workers about your new application.

14:07: Somebody hits a bug in the application. Instantly clone the environment to reproduce and test a fix; send the fix through the CI/CD pipeline.

16:20: [Storage resources scale down as the load reduces.]

18:30: Kick off a set of tests with real data for your new application; head home.

23:30: Sleep. [Your data is secure and the records are available to security, auditors, etc.]

Aim for the head and take control of data in the cloud. Courtesy: https://wallpapercave.com/thor-lightning-wallpapers

The First Day of the Rest of Your Life

Cloud offers a tantalizing array of options to help build applications. Without data management, those tools will torment you because you won’t have time to use them. Plan ahead, though, and you can find a better way. Day 2 can be the best day of your life, not a nightmare.

Cloud should give you choice. Cloud should give you automation. Cloud should make you more productive. If that’s not happening, it’s time to look for a cloud-first data management solution.

Why Urgency for Cloud Went to 11

What the Executives Aren’t Telling You

Congratulations! You own “the cloud platform” for your company. Maybe you applied for the role. Maybe you got volunteered. Most of you are just doing the job because somebody has to.

Regardless, your job is simple: lay tracks in front of a speeding freight train without getting flattened. (I said the job is simple, not easy.)

Why did the company put you in this position? Why are they asking you to move legacy workloads? And why are they pushing so hard now?

The #1 reason I hear from cloud practitioners is: “Because my Management said so.” If you want to be successful, that answer is not good enough. You need to know why the company wants to use public cloud, so you know how they’re measuring success… and you.

Your boss, talking about cloud. Courtesy: Bryan Valenza

Why Public Cloud?

Why are most companies adopting cloud?

Agility.

They aspire to move faster than their competitors. Executives imagine that first to the cloud will get the “multi-cloud, serverless, Kubernetes, microservices, automated, agile, synergistic, digital transformation, IT modernization orgasm of profit!”*

Buzzwords aside, there are real benefits to cloud. It helps companies develop, deploy, and scale applications. It shifts technology costs from large irregular capital expenses to predictable operational expense. Underneath the hype, cloud has value. That’s why it’s growing.

* NOTE: These are actual statements from actual CEO/CIO/CFOs.

The Executive Conference Room for “Orgasm of Profit” Courtesy: Disney

Why Move Old Workloads to Public Cloud?

If the business wants to move forward faster, why spend time on legacy applications?

Critical Mass.

Companies have legacy environments, private cloud, and public cloud. The legacy runs the business. Most IT professionals are experts in one legacy discipline — e.g. compute, storage, networking. Since people want to feel useful, they focus on their silo in the legacy environment. That’s why the public cloud never gets enough attention from IT. The only way to drive critical mass to the cloud is to force IT to move the legacy applications to the cloud. And if that saves the company capital expense on equipment and data centers, bonuses for everyone!*

* NOTE: “Everyone” being only those with access to the conference room dedicated to the “orgasm of profit”.

The business pressure to move to cloud now is real. Courtesy: South Park

Why are Companies Moving NOW?

Why is management putting so much stress on moving to cloud now?

They’re not. It just feels that way. You moved the EASY workloads to the cloud. Moving the next workloads will be HARD. But the schedule is the same. That’s stressful.*

Executives have been pushing for agility and savings via cloud for years. First, companies adopted SaaS for basic functions. Second, they moved test and development to cloud. Third, they stored cold data in the cloud.

Now that you’ve done the “easy” work, it’s time for the hard job — moving real applications. Real applications keep persistent customer data in databases and files. Real applications are complex. Real applications need availability, security, data protection, and predictable performance. Real applications run the business. (Don’t panic, though. There are many real applications to move before getting to SAP and Oracle.)

Executives are hooked on cloud wins. Those wins “prove” that they’re innovating and beating the competition. The savings feel good, too. At each hardware refresh cycle, moving to the cloud cuts capital expenses. The savings from each cloud step funds the next one. It doesn’t matter that each step gets more difficult. Everything depends on the next hit of capital savings. That’s why executives need you to deliver the next step… now.

* NOTE: I took a class taught by Turing Award winner Michael Rabin. He spent half of each lecture covering simple arithmetic. At the end, he raced through complex math proofs. We asked why he spent so much time on the simple math vs. the hard math. His answer: “It’s all simple to me.” That’s how executives think about cloud. It’s all simple to them.

Most executives thought Spinal Tap was a documentary. Courtesy: knowyourmeme.com

Conclusion

Businesses need to move to the cloud to compete. It’s not enough to just build some cloud-native applications. They need critical mass on the cloud. That’s why they’re asking IT to migrate legacy workloads.

IT feels tremendous pressure from the business because the next cloud migrations will be hard. There are no more easy wins. You’ve done SaaS, test and development, and archive. Now, it’s time to move business applications. They’re complicated. They have data. They run the business. And they need to be moved now.

Congratulations on owning the cloud platform! Keep running, the train is always coming.

Merry Misadventures in the Public Cloud

Seven Costly Cloud Catastrophes in Seven Days

My first Amazon Web Services (AWS) bill shocked and embarrassed me. I feared I was the founding member of the “Are you &#%& serious, that’s my cloud bill?” club. I wasn’t. If you’ve recently joined, don’t worry. It’s growing every day.

The cloud preyed on my worst IT habits. I act without thinking. I overestimate the importance of my work (aka rampaging ego). I don’t clean up after myself. (Editor’s note: These bad habits extend beyond IT). The cloud turned those bad habits into zombie systems driving my bill to horrific levels.

When I joined Nuvoloso, I wanted to prove myself to the team. I volunteered to benchmark cloud storage products. All I needed to do was learn how to use AWS, Kubernetes, and Docker, so I could then install and test products I’d never heard of. I promised results in seven days. It’s amazing how much damage you can do in a week.

 

Sometimes too much is too much. Photo Credit: Danny Sullivan

Overprovisioning - Acting without Thinking

I overprovisioned my environment by 100x. The self-imposed urgency gave me an excuse to take shortcuts. Since I believed my on-premises storage expertise would apply to cloud, I ran full speed into my first two mistakes.

Mistake 1:Overprovisioned node type.

AWS has dozens of compute node configurations. Who has time to read all those specs? I was benchmarking storage, so I launched 5 “Storage Optimized” instances. Oops. They’re called “Storage Optimized” nodes because they offer better local storage performance. The cloud storage products don’t use local storage. I paid a 50% premium because I only read the label.

Mistake 2: Overprovisioned storage.

You buy on-premises storage in 10s or 100s of TB, so that’s how I bought cloud storage. I set a 4 TB quota of GP2 (AWS’ flash storage) for each of the 5 nodes — 20TB in total. The storage products, which had been built for on-premises environments, allocated all the storage. In fact, they doubled the allocation to do mirroring. In less than 5 minutes, I was paying for 40TB. It gets worse. The benchmark only used 40GB of data. I had so much capacity that the benchmark didn’t measure the performance of the products. I paid a 1000x premium for worthless results!

Eventually, you have to clean up the mess. Photo Credit: Reuters

Just Allocate A New Cluster - Ego

I allocated 4x as many Kubernetes clusters as I needed.

When you’re trying new products, you make mistakes. With on-premises systems, you have to fix the problem to make progress. You can’t ignore your burning tire fire and reserve new lab systems. If you try, your co-workers will freeze your car keys in mayonnaise (or worse).

The cloud eliminates resource constraints and peer pressure. You can always get more systems!

Mistakes 3 & 4: I’ll Debug that Later” / “Don’t Touch it, You’ll Break It!”

Day 1:Tuesday. I made mistakes setting up a 5-node Kubernetes cluster. I told myself I’d debug the issue later.

Day 2: Wednesday. I made mistakes installing a storage product on a new Kubernetes cluster. I told myself I’d debug the issue later.

Day 3: Thursday. I made mistakes installing the benchmark on yet another Kubernetes cluster running the storage. I told myself that I’d debug the issue later.

Day 4: Friday. Everything worked on the 4th cluster, and I ran my tests. I told myself that I was awesome.

Days 5 & 6 — Weekend. I told myself that I shouldn’t touch the running cluster because it took so long to setup. Somebody might want me to do something with it on Monday. Oh, and I’d debug the issues I’d hit later.

Day 7 — Monday. I saw my bill. I told myself that I’d better clean up NOW.

In one week, I had created 4 mega-clusters that generated worthless benchmark results and no debug information.

"Terminate Instance" - I do not think it means what you think it means. Photo Credit: Princess Bride

Clicking Delete Doesn't Mean It's Gone - Cleaning up after Myself

After cleaning up, I still paid for 40TB of storage for a week and 1 cluster for a month.

The maxim, “Nothing is ever deleted on the Internet” applies to the cloud. It’s easy to leave remnants behind, and those remnants can cost you.

Mistake 5: Cleaning up a Kubernetes cluster via the AWS GUI.

My horror story began when I terminated all my instances from the AWS console. As I was logging out, AWS spawned new instances to replace the old ones! I shut those down. More new ones came back. I deleted a subset of nodes. They came back. I spent two hours screaming silently, “Why won’t you die?!?!” Then I realized that the nodes kept spawning because that’s what Kubernetes does. It keeps your applications running, even when nodes fail. A search showed that deleting the AWS Auto Scaling Group would end my nightmare. (Today, I use kops to create and delete Kubernetes clusters).

Mistake 6: Deleting Instances does not always delete storage

After deleting the clusters, I looked for any excuse not to log into the cloud. When you work at a cloud company, you can’t hide out for long. A week later, I logged into the AWS for more punishment. I saw that I still had lots of storage (aka volumes). Deleting the instances hadn’t deleted the storage! The storage products I’d tested did not select the AWS option to delete the volume when terminating the node. I needed to delete the volumes myself.

Mistake 7: Clean Up Each Region

I created my first cluster in Northern Virginia. I’ve always liked that area. When I found out that AWS charges more for Northern Virginia, I made my next 3 clusters in Oregon. The AWS console splits the view by region. You guessed it. While freaking out about undead clusters, I forgot to delete the cluster in Northern Virginia! When the next month’s sky-high bill arrived, I corrected my final mistake (of that first week).

Welcome to the Family

Cloud can feel imaginary until that first bill hits you. Then things get real, solid, and painful. When that happens, welcome to the family of cloud experts! Cloud changes how we consume, deploy, and run IT. We’re going to make mistakes (hopefully not 7 catastrophic mistakes in one week), but we’ll learn together. I’m glad to be part of the cloud family. I don’t want to face those undead clusters alone. Bring your boomstick.