Merry Misadventures in the Public Cloud

My first Amazon Web Services (AWS) bill shocked and embarrassed me. I feared I was the founding member of the “Are you &#%& serious, that’s my cloud bill?” club. I wasn’t. If you’ve recently joined, don’t worry. It’s growing every day.

The cloud preyed on my worst IT habits. I act without thinking. I overestimate the importance of my work (aka rampaging ego). I don’t clean up after myself. (Editor’s note: These bad habits extend beyond IT). The cloud turned those bad habits into zombie systems driving my bill to horrific levels.

When I joined Nuvoloso, I wanted to prove myself to the team. I volunteered to benchmark cloud storage products. All I needed to do was learn how to use AWS, Kubernetes, and Docker, so I could then install and test products I’d never heard of. I promised results in seven days. It’s amazing how much damage you can do in a week.

Overprovisioning — Acting without Thinking

I overprovisioned my environment by 100x. The self-imposed urgency gave me an excuse to take shortcuts. Since I believed my on-premises storage expertise would apply to cloud, I ran full speed into my first two mistakes.

Mistake 1: Overprovisioned node type.

AWS has dozens of compute node configurations. Who has time to read all those specs? I was benchmarking storage, so I launched 5 “Storage Optimized” instances. Oops. They’re called “Storage Optimized” nodes because they offer better local storage performance. The cloud storage products don’t use local storage. I paid a 50% premium because I only read the label.

Mistake 2: Overprovisioned storage.

You buy on-premises storage in 10s or 100s of TB, so that’s how I bought cloud storage. I set a 4 TB quota of GP2 (AWS’ flash storage) for each of the 5 nodes — 20TB in total. The storage products, which had been built for on-premises environments, allocated all the storage. In fact, they doubled the allocation to do mirroring. In less than 5 minutes, I was paying for 40TB. It gets worse. The benchmark only used 40GB of data. I had so much capacity that the benchmark didn’t measure the performance of the products. I paid a 1000x premium for worthless results!

Just Allocate A New Cluster — Ego

I allocated 4x as many Kubernetes clusters as I needed.

When you’re trying new products, you make mistakes. With on-premises systems, you have to fix the problem to make progress. You can’t ignore your burning tire fire and reserve new lab systems. If you try, your co-workers will freeze your car keys in mayonnaise (or worse).

The cloud eliminates resource constraints and peer pressure. You can always get more systems!

Mistakes 3 & 4: “I’ll Debug that Later” / “Don’t Touch it, You’ll Break It!”

Day 1:Tuesday. I made mistakes setting up a 5-node Kubernetes cluster. I told myself I’d debug the issue later.

Day 2: Wednesday. I made mistakes installing a storage product on a new Kubernetes cluster. I told myself I’d debug the issue later.

Day 3: Thursday. I made mistakes installing the benchmark on yet another Kubernetes cluster running the storage. I told myself that I’d debug the issue later.

Day 4: Friday. Everything worked on the 4th cluster, and I ran my tests. I told myself that I was awesome.

Days 5 & 6 — Weekend. I told myself that I shouldn’t touch the running cluster because it took so long to setup. Somebody might want me to do something with it on Monday. Oh, and I’d debug the issues I’d hit later.

Day 7 — Monday. I saw my bill. I told myself that I’d better clean up NOW.

In one week, I had created 4 mega-clusters that generated worthless benchmark results and no debug information.

Clicking Delete Doesn’t Mean It’s Gone — Cleaning up after Myself
After cleaning up, I still paid for 40TB of storage for a week and 1 cluster for a month.

The maxim, “Nothing is ever deleted on the Internet” applies to the cloud. It’s easy to leave remnants behind, and those remnants can cost you.

Mistake 5: Cleaning up a Kubernetes cluster via the AWS GUI.

My horror story began when I terminated all my instances from the AWS console. As I was logging out, AWS spawned new instances to replace the old ones! I shut those down. More new ones came back. I deleted a subset of nodes. They came back. I spent two hours screaming silently, “Why won’t you die?!?!” Then I realized that the nodes kept spawning because that’s what Kubernetes does. It keeps your applications running, even when nodes fail. A search showed that deleting the AWS Auto Scaling Group would end my nightmare. (Today, I use kops to create and delete Kubernetes clusters).

Mistake 6: Deleting Instances does not always delete storage

After deleting the clusters, I looked for any excuse not to log into the cloud. When you work at a cloud company, you can’t hide out for long. A week later, I logged into the AWS for more punishment. I saw that I still had lots of storage (aka volumes). Deleting the instances hadn’t deleted the storage! The storage products I’d tested did not select the AWS option to delete the volume when terminating the node. I needed to delete the volumes myself.

Mistake 7: Clean Up Each Region

I created my first cluster in Northern Virginia. I’ve always liked that area. When I found out that AWS charges more for Northern Virginia, I made my next 3 clusters in Oregon. The AWS console splits the view by region. You guessed it. While freaking out about undead clusters, I forgot to delete the cluster in Northern Virginia! When the next month’s sky-high bill arrived, I corrected my final mistake (of that first week).

Welcome to the Family

Cloud can feel imaginary until that first bill hits you. Then things get real, solid, and painful. When that happens, welcome to the family of cloud experts! Cloud changes how we consume, deploy, and run IT. We’re going to make mistakes (hopefully not 7 catastrophic mistakes in one week), but we’ll learn together. I’m glad to be part of the cloud family. I don’t want to face those undead clusters alone. Bring your boomstick.

How I Found My Path to the Cloud

“Dell EMC’s Data Protection Division won’t need a CTO in the future.”

I started 2017 as an SVP and CTO at the world’s largest on-premises infrastructure provider. I ended the year at a 10-person startup building data management for the public cloud. Like many, my journey to the cloud began with a kick in the gut. Like most, I have no idea how it will end.

The Dell layoff didn’t depress me. I’d seen the budget cut targets, so I kånew I wasn’t alone. The layoff felt personal rather than professional, so my ego wasn’t bruised. Since cloud is eating the on-premises infrastructure market, I’d wanted to move. Since I’d always had my choice of jobs, I looked forward to new opportunities

The job hunt, however, plunged me into the chasm of despair. I wanted to be cutting edge, so I applied to cloud providers and SaaS vendors. What’s worse than companies rejecting you? Companies never responding. Even with glowing internal introductions from former colleagues, I heard nothing. No interview. No acknowledgement. Not even rejection. My on-premises background made me invisible. Then, I applied to software companies moving to the cloud. They interviewed me. They rejected me for candidates with “cloud” expertise. My on-premises background made me undesirable. Legacy infrastructure companies called, but I needed to build a career for the next 20 years, not to cling to a job for 5 more years. For the first time in my working life, I worried about becoming obsolete.

Then I found hope. I met a recently “promoted” Cloud Architect whose boss wanted him to “move IT to cloud”. His angst-ridden story sounded familiar: change-resistant organization, insufficient investment, and unsatisfactory tools. He couldn’t deliver data protection, data compliance and security, data availability, or performance. He couldn’t afford to build custom data management solutions. The business didn’t even want to think about it. They did, however, expect an answer.

I realized data management was my ticket into the cloud. Even in cloud, data management problems don’t go away. The problems I know how to solve still matter. In fact, expanding digitization and new regulations (e.g. GDPR, ePrivacy Directive) make solving those problems more important. Even better, the public cloud’s architecture opens better ways to build data management. Electricity surged through me. Cloud gave me the opportunity to build the data management solution I’d spent my career trying to create. Now, I needed to find a place to build it.

Nuvoloso, our startup, wants to help people like me get to the cloud. Individually, each member of the team has built data management for client-server, appliances, and virtualization. Now, together, we’re building data management for cloud. The requirements don’t change, but the solutions must. Each of us adds value with our existing skills, while learning about the public cloud. Our product will enable infrastructure IT professionals to follow our path. We will help them use their experience to add value and get a foothold in the cloud.

The journey to the cloud still ties my stomach in knots. When I started at Nuvoloso, I felt helpless and terrified. Cloud took everything I knew, and changed it just enough to confuse me. As I’ve adjusted, I feel helpful, excited and (still) terrified. Public cloud is real. Public cloud changes how businesses buy and use technology. Public cloud does not, however, eliminate the requirements for data management; it amplifies it. Public cloud will not replace us. Public cloud needs our skills and experience. No matter where the applications run, somebody needs to manage the data infrastructure.

Your journey to the cloud may begin with a project, a promotion, or (like me) a layoff. Regardless of how you start, remember: There’s a future for people like us.