Monday, May 6, 2013

April Recap

My trend of posting monthly recaps a few days late continues...  Sorry about that, hopefully the May recap will be on-time.  I was traveling most of April so the blogs this month tend to reflect that.

I'll start with the Cloudcast (.net) for the month of April.  We published a record number of episodes. A HUGE thanks to both Amy Lewis and Brian Katz for their amazing contributions!  Amy did a fantastic job as roving reporter and Brian's Mobilecast is really taking off!  As always, please send us any show feedback, we love to hear from you!


Next up is my new TechTarget Blog, you have subscribed with your latest Google Reader replacement, right??  I'm really having a good time writing over there.  This site (aarondelp.com) has always been more hands on and live blogs from events but the interest in the latest trends around Open Clouds and the operational aspects of cloud computing has been both great and humbling.  Thank you to everyone who has taken the time to read the articles and provide feedback!


The only blogging I was able to do on my site this month is Live Blogs from the AWS event.  Here are all of them.

Tuesday, April 30, 2013

AWS Summit Liveblog: RightScale - Hybrid IT Design

Usual Liveblog disclaimer: typing this as I go in the session, please excuse typos and formatting issues

Title: Hybrid IT - Steps to Building a Successful Model - presented by RightScale
Presenter: Brian Adler, Sr. Services Architect, RightScale & Ryan Geyer, Cloud Solutions Engineer

Brian is services, this won't be a product pitch ;)

RightScale is a CMP (Cloud Management Platform) - provides configuration management, an automation engine, as well as governance controls and does both public and on-premise clouds (I think the word private cloud must be on the naughty list at the show, all pitches do NOT use the dirty "p word")

RightScale allows management & automation across cloud resource pools

basic overview of terminology and where we have come in IaaS to Cloud Computing today

On-Premise Key Considerations

1. Workload and Infrastructure Interaction - what are the resource needs? Does this make sense in the cloud and which size instance would be best?  Instance type is very important
2. Compliance - data may be contained on-prem for compliance
3. Latency - does the consumer require low latency for a good user experience
4. Cost - the faster it has to go (latency) the more expensive it will be in the cloud
5. Cost - What is the CAPEX vs. OPEX and does it make sense

Use Cases

1. Self-Service IT Portal (The IT Vending Machine) - Users select from fixed menu, for example, pre-configured and isolated test/dev

Demo Time - Showing off an example of a portal using the RightScale API's, basically push a big button, enter a few options, let it spin up an an environment, in this example they provisioned five servers and a php environment in a few minutes

2. Scalable Applications with Uncertain Demand - This is the typical web scale use case, fail or succeed very fast in the public cloud. "See if it stucks", once it sticks, maybe pull it in house if cost reduction can be achieved when the application is at steady state

3. Disaster Recovery - Production is typically on-premise and DR environment is in the cloud, this is often considered a "warm DR" scenario - replication in real time database from production to DR, all other servers are "down".  You then spin up the other servers and the DB is already up and running, then flip the DNS entries over when DR is up and running.  You can achieve an great RTO & RPO in this example.  You can also do this from on AWS region to another.

Demo Time - Showing RightScale Dashboard with a web app demo + DR.  Demo had 2 databases, master and slave replicating and in different regions (side discussions about WAN optimization and encryption here as well), Production in the example was in US-East AWS and DR was US-West AWS.  The front end of the app was down in West.  When you launch the West DR site, it will go and configure everything and automated as part of the server template.  All DR happens just by turning up the front end in West

Design Considerations

Location of Physical Hardware- again speed vs. latency vs. location

Availability and Redundancy Configuration - This can be easy to hard depending on your needs

Workloads, Workloads, Workloads - Does the application require HA of the infrastructure? Will it tolerate an interruption? Can it go down?  Will users be impacted?

Hardware Considerations - Do you need specialty? commodity?

(Sorry, he had others listed, I zoned out for a slide or two..)

On to Hybrid IT - Most customers start out wanting "cloud bursting" but most often an application is used in one location or the other.  Check out the slide for the reasons.

Common practice is a workload is all on-premise or public. Burting isn't a common use case.  If they do use bursting, they set up a VPC between private and public to maintain a connection.

Demo Time - What would a hybrid bursting scenario look like in the RightScale dashboard?  Customer has a local cloud that is VPC connected to AWS.  Load Balancers, one is private, one is in AWS.  They are using Apache running on top of a virtual machine to maintain compatibility between private and public.  DNS is using Route 53 (AWS DNS).  RightScale uses the concept on an Array.  As RightScale monitors the performance, additional instances are fired up and "bursted" or scaled out to AWS above and beyond the local already running resources.

You do not need the same LB's on the front end like the example above.  For example could be in a local CloudStack/OpenStack environment with a hardware firewall in front but also include AWS and AWS ELB in the rules as well

Take Away - It is very possible to use both public and private and there isn't a need for a "one size fits all approach"

Great session, probably the best session of the day so far for me today.




AWS Summit Liveblog: Cloud Backup and DR

Usual Liveblog Disclaimer: This is type as fast as I can, blog may contain typing and formatting errors, sorry about that

Session: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud (whew, long title)

Presenter: Simone Brunozzi, Technology Evangelist

Simone presented in the morning keynote on the Enterprise demo, good presenter

3 parts = HA -> Backup -> Disaster Recovery

HA = Keeping Services Alive

Backup = Process of keeping a copy

DR = Recover using a backup

(Simone has is using great examples using churches and monasteries but too long to type all of that out here.)

5 Concepts of DR

1. My backup should be accessible - AWS uses API's, Direct Connect, customer owns the data, redundancy is built it, AWS has import/export capabilities

AWS Storage Gateway as an example, using a gateway cache volume on-premise that will replicate to a volume in AWS public cloud, S3, snapshots, etc.  Can be a GW-cached or GW-stored (one is a cache, the other is a full offline copy). Secure tunnel for transport over AWS Direct Connect or Internet

2. My backup should be able to scale - "Infinite scale" with S3 and Glacier, scale to multiple regions, seamless, no need to provision, cost tiers (cheaper options and at scale are available)

3. My backup should be safe - SSL Endpoints, signed API calls, stored encrypted files, server-side encryption, durability: multiple copies across different data centers, local/cloud with AWS Storage Gateway

4. My backup should work with my DR policy (I don't want to wait 10 years to recover) - easy to integrate within AWS or Hybrid, AWS Storage Gateway: Run services on Amazon EC2 for DR, cleat costs, reduced costs, You decide the redundancy/availability in relation to costs.

5. Someone should care about it - Need clear ownership, permission can be set in IAM with roles, monitor logs

Now a customer story:

Shaw Media - Canadian Media Company, before AWS - multiple datacenters, lot of equipment, downtime, different technologies across datacenters - they were told to change everything and become more agile and cost effective in the next 9 months to better serve the business

Solved the issue with AWS, fast deployment of servers, network rules, and ELB on AWS, first site in only 4 weeks, after that a full migration of 29 sites from a physical DC in 9 months - This was Phase one (This was main websites)

Phase Two - Other web services migration was next (check out the picture for the details), impressive stats.  Typical web servers, apps servers, database servers, etc.


Lessons Learned - went to fast, didn't catch it... damnit

DR - Learn from your outages (test your policy on a regular basis and refine the document)

(Sorry, he's going to fast to type or even take pictures of the slides.... Really wish he would he gone slower in this section, the content was really good grrrrrrr)

Lessons to learn from DR

1. You NEED a DR plan in place - how will you recover?  Can your business survive without it?  For AWS, across Availability Zones (AZ's) or App DR with Standby (see pictures).  The second option is cheaper to implement but will take a little longer to recover from.

 

Perform a business analysis of RTO & RPO (if you don't know what that is, Google it, you need to know what it is)  In a nutshell, RTO, how long to get it back, RPO, how much data can I lose?  This is the typical cost vs. performance trade off.  Take the various AWS services as an example:


2. Test your DR - Many may say Duh! to this one but I'm always surprised how little customers actually do this.  The ability to spin up capacity just for DR testing helps to minimize cost and the ability to not have a DR site to manage is pretty cool. Data Transfer speeds (Data Gravity) could be an issue in this kind of scenario

3. Reducing Costs - Took a screenshot, it was easier


Overall - great presentation although I wish he would have spent more time on the customer slides as there was some good technical content there...




AWS Summit Liveblog: Introducing AWS OpsWorks

Usual liveblog disclaimer, this could be messy, please excuse typos, sorry for that.

Chris Barclay, Product manager for AWS OpsWorks is presenting

Application Management Challenges - Reliability and Scalability are important, operations tasks typically: Provision, Deplot, etc.

"Once Upon a Time..."  - We took the time to develop everything by hand (home made bread)

Today we need to automate to go faster (cranking out automation in a factory like, mass produced way)

In Today's infrastructure, everything is considered code, including the configuration of the "parts", sounds much like a recent Cloudcast we did...

AWS OpsWorks is a tool to tackle this challenge, very reliable and repeatable and integrated with AWS, at no additional cost

Why use OpsWOrks?
Simple, Productive, Flexible, Powerful, Secure

Common complaint was there are a lot of AWS "building blocks" but many don't want to stitch them together, AWS at times can be complex because of large number of services offered

Chris turned over the presentation over to another person (didn't catch the name) at DriveDev, DevOps consulting group, focus on F500 and startups

He talked about a typical "old school" application development that went poorly. They were able to use built in OpsWorks recipes with the addition of Chef Cookbooks on top of it. Took customer and migrated them off private and into public with OpsWorks in a short amount of time.  Basically, they were a success...

How are customers using OpsWorks today?

From OS to application using OpsWorks, From OS to your code using beanstalk, From OS up and automate everything with Chef or another tool

Takeaway - It depends on how much automation you need and at what level and up depends on which tool will be best.


Demo Time...

Talking about Chef and how OpsWorks uses it

The concept of Lifecycle events, based on this a recipe is triggered

 

Showing integration with github, keeps source and cookbooks out on git

Chris did a creation of a stack, PHP app server layer with MySQL on top, then added instances and started them up (could change to multiple AZ's for HA at creation)

After this, there are builtin Chef recipes that can be used, you can also add your own if need additional functionality, can also add additional EBS volumes if needed, elastic IP's, IAM Instance profiles, etc.

Talked about a time based instance - an instance that only exists during certain times of day, also threshold instances that can be fired up as needed (scaling of an app server based on memory, CPU, network, etc)

Added the app from git onto the stack that was built

Chris went from here into deep level git items that were above me (I admit I'm not the target audience here).  The take away, he made a change, committed the change, performed a deployment, looked very easy

Now on to Permissions - talking about various 

What's next?  More integrations with AWS resources (i.e. ELB features) - Deeper VPC, more built-in layers (go vote on their forums, they will prioritize by public opinion)

Summary: OpsWorks for productivity, control, reliability


AWS Summit Keynote Live Blog


This is a live blog from the AWS Summit Keynote by Andy Jassy.  The usual disclaimer applies, I'll be typing fast and furious so expect misspellings and some formatting errors.  Also, no Internet in the keynote (MiFi or conference) so I'll be moving this over to the blog after the keynote.

There are a TON of people at the event (I'll see if they announce numbers but easily in the thousands), impressive

Intro videos going on now…

Andy Jassy in on stage - starts with the age of AWS, 7 years old, March 2006

Now digging into the breadth of the services - they are very proud of the pace of innovation (see pictures attached)

With the exception of 2010, they have doubled the number of services every year, up to almost 160 services available today

71 new features so far in 2013



9 regions, 25 availability zones, 39 edge locations - also talked about the GovCloud and the requirements on it to support Public Sector workloads

Amazon S3 - Over 2 Trillion objects, 1,100,000 peak requests/sec

He's firing facts and figures now so fast I can't keep up. Nothing but speeds and feeds and stats to impress. He's talking very fast

Talking about customers and user base

 

Use cases - talking about the use case is really abut building blocks and letting the developers decide how to stitch together the blocks, AWS was not going to dictate the use cases

Talking about security - security is number one priority at AWS, talking about features access control from the edge, dedicated instances, encryption, etc.

Certifications are more important than security - They are HIPPA, ISO, SOX, FISMA, etc.

Now moving on to pricing (he's talking really fast, no transition in between topics)

They plan to remove cost from process and pass on to customers, 31 price drops to date, the more customers they have, the better economy of scale, they consider this a "wheel" more customers drives price drops which brings in more customers

AWS Trusted Advisor - checks for cost optimizations, security and availability checks, performance recommendations (running on demand vs. reserved instances for instance), pretty cool stuff.  I remember hearing about this but never dug into it.  It appears they are trying to change the mindset about steady state apps, they have brought this up a few times that you can run steady state in cloud, but need to do it on a reserved instance.

Now on to partners (again, no real transition) - The usual impressive list of both consulting and technology partners

AWS Marketplace - Their "App Store", 25 categories, 778 product listings - applications already configured and certified on the AWS ecosystem

Why are customers adopting cloud computing? (finally, a real transition)

1. Trade Capital Expense (CAPEX) for Operating Expense (OPEX) - $0 to get started and can fail fast if needed
2. Lower Variable Expense than most companies can do in house - they mention again how large they are and the economies of scale to pace on t customers (seems to be their new message) - They appear to be positioning themselves as the "Walmart of the Cloud" - Low Price Leader and pass savings on to you
3. You Don't Need to Guess Capacity - Talking about the typical predict up front model, what happens if you build it and nobody comes? What happens if too may people come?  If the infrastructure is elastic no need for this planning and predictive step
4. Dramatically Increase Speed and Agility - Old World server request, usually takes weeks to get servers for development, AWS takes minutes and is all self service - compares development to invention, need to perform a lot experiments, need to experiment and fail with little to no cost or collateral damage, speeds up development
5. Stop Spending Money on the Undifferentiated Heavy Lifting - They do all the "infrastructure stuff" for you, talking about how the infrastructure typically doesn't differentiate your business in anyway but it also consumes a lot of resources in operations.
6. Go Global in minutes - Because of Regions and Availability Zones the ability to scale and go grow to a different region is much easier. No need to set up operations in another area of the world

Message is very Enterprise centric (no surprise there)

Sean Beausoleil is on stage now - lead engineer for Mailbox - 2 years ago - talking about their first product, it worked but wasn't "sticky" enough, the reason was because email still held most user's data. How to tackle the mailbox as a better tool and task management

Now a video about Mailbox uses - In case you haven't tried it, Mailbox basically turns your mail into a to-do list. They were overwhelmed with the response to the initial movie that was release as a preview. They needed a massively scalable back end to support. The product pulls from IMAP -> Cloud -> to device (see picture)
They knew they would need a massive backend on AWS, they copied their existing system to AWS, they found a lot of bottlenecks in the app as they scaled up in testing.  They were able to test AHEAD of production.  Some components of the app were rewritten.  That is why the introduced the reservation system some of you that got the app may have seen.  (I was on that list)


The created the reservation system so they could scale over time until they were sure they could scale.  Even all this preparation didn't prepare them for the growth.  They were handling 100 Mil emails a day in 2 months from launch.  They are able to re-architect on the fly, comment was "you can't predict what production will look like until you are in production". I couldn't agree more based on past experience

AWS allowed them to optimize and scale and perform swaps of hardware instance sizes on the fly to balance the usage against the costs.  They would model the workload and perform swaps of hardware seamlessly in the background with no downtime.  I have to admit, that is pretty frckin cool.

Andy is back - AWS adoption into the Enterprise is the topic now

Andy is now talking about how most "old guard" are pushing for private cloud. He states none of the 6 points above are available in private cloud. He says old guard is high margin business that isn't the same as AWS. He is now talking about a balance of "old" on premise resources and new cloud era workloads - talking about Amazon Direct Connect, LDAP integration, VPC, etc. Says these tools to move from on-premise enterprises are the focus going forward. Mentions BMC and CA as partners in the future for single plane of glass management

How are Enterprises using AWS?

Strategy 1: Cloud for Development and Test - first and most common use case
Strategy 2: Build New Apps for the Cloud - this is the next generation of applications. Retire the old and create new apps, faster to build, less expensive to run, easier to manage, etc
Strategy 3: Use Cloud to Make Existing On-Prem Apps Better - Take in house apps and outsource the analytics for example for processing in the cloud. They mentioned a few enterprises including Nasdaq that do this today
Strategy 4: New Cloud Apps the Integrate Back to On-prem systems - AWS serves up the front end and the processing is on the back end on-prem
Strategy 5: Migrate Existing Apps to Cloud - he admits this is emerging and often requires consulting services, taking that very traditional workload and move it to the cloud
Strategy 6: All in - NETFLIX!  No keynote is complete with out them…

Now up - Demo of Enterprise and cloud by Simone (need his name)
They want to show you how AWS is relevant in the Enterpise
3 parts - Authentication, Integration, Migration

Authentication - Talking about Okta, an AD integration partner, brings AD into the AWS, Created an AWS Admins group in AD and it will talk to AWS IAM and preform the changes to needed to access AWS - AWS admin rights

Integration - Storage Gateway for Backup and Recovery Volumes - volume on premise - replicates to S3, replication of data happens, stand up an EC2 instance and attach to the volume on AWS if needed - talked about iSCSI targets and how to attach them (that brings back memories). Once this is done you could map back to on-premise (little fuzzy on the details)

Migration - Talking about moving export an image from VMware vCenter on-premse, transfer to AWS as an image (AMI). From there you can copy to another region. the example here is move to USA first and then transfer to Singapore.  I admit the use case of moving region to region is really cool.



Talking again about the perception of AWS and the Enterprise. The is obviously a focus.

What ar ether working on next? Amazon VPC is a focus (to continue to build the Enterprise), Direct Connect, Amazon Route 53 (DNS Services)

I'm actually gonna bail on the rest of this so i can go get a seat in the labs before they fill up. (Scratch that, line is so long for the labs they are useless)


They appear to be positioning themselves as the "Walmart of the Cloud" - Low Price Leader and pass savings on to you.  Key message also was to recognize that Enterprise will continue to use on-premise

Summary - Good stuff, it is good to hear them focus on the Enterprise and do it an a way that isn't as in your face as it was at the AWS:ReInvent conference.

Friday, April 5, 2013

March Recap

This post is a few days late but I wanted to put together a recap of everything that has been happening in March. To say March was a busy month was an understatement!  I'm not sure how much content I'll be able to post here in April as I have two speaking engagements to prepare for and I have decided I'm going to transition this blog away from Blogger and Feedburner to a WordPress hosted site.  Look for the new site probably sometime in late May based on my schedule right now.

March was our busiest month in recent memory at The Cloudcast (.net).  We published seven podcasts in March including the beginning of our expansion plans with our first podcast branch, the Mobilecast, as well as our first in a series of guest hosts, the always awesome Amy Lewis at Cisco.  Our goal for 2013 is to extend our reach into areas people have told us they want as well as some new faces to the podcast.  Please tell us what you think!

The Cloudcast #76 - Bringing Depth to PaaS for Real World Deployments
The Cloudcast #77 - OpenStack, PaaS APIs, Platform Tools, Automation & News
The Cloudcast #78 - Open Source Software 101
The Cloudcast #79 - DevOps Evolution and the Phoenix Project
The Cloudcast #80 - Regional Cloud Madness

The Mobilecast #1 - A Year of Going Mobile
The Mobilecast #2 - Health, Fitness and Wearable Computing

In addition to this blog I have also been asked to blog about Cloud Computing over at Tech Target.  I have a pretty extensive consulting and operations background so I have been asked to think about cloud computing from an operations standpoint.  I'm aiming for at least one blog a week over there.  Please head on over and subscribe to the blog!  I met my goal in March, here are links:

What Happens When Your Cloud Goes Away?
Cloud Applications and Vanishing Software Generations
Will Clouds Ever Be Open?
Impacts of Cloud Workload Consolidation

Last (but not least!) on this site I published two articles, one on the NYC Cloud Computing Meetup I attended and a new semi-regular news link round up I plan to do.

NYC Cloud Computing Meetup Recap
In Case You Missed It #1

As always, thanks to everyone for coming by and look for big changes coming "soon"!

Tuesday, March 19, 2013

NYC Cloud Computing Meetup Recap

Last week I was able to attend the New York City Cloud Computing Meetup.  It was a very cool event and Joe Brockmeier presented Deploying Apache CloudStack from API to UI.



Deploying Apache CloudStack from API to UI from Joe Brockmeier

Joe did an awesome job (as always) and the meetup was nicely attended, I would estimate about 40-50 people were in the room.  Here are a few random thoughts and impressions in no particular order:

  • The session was very interactive. It took the crowd a little bit to come out of their shell but once they did the discussion was very free form and constructive
  • The level of questions were very good.  Many were about how to implement and architecture related questions about specific features. Snapshots in particular generated a lot of discussion on slide 26. It appears we are starting to move beyond the basic cloud definitions and into the nitty gritty of implementations
  • There were customers in the room and they greatly helped with the discussions (Thanks Jeff @ DataPipe!). It was great to hear how their real world experiences were put to use and how they were able to tackle some of the issues and concerns brought up
  • I like how Joe started with some features of the NIST definition and then added an additional point (see slide 4). I agree with Joe that API access is crucial going forward
  • Slide 15 (the architecture overview) generated a lot of real world discussion in the room that I believe was very helpful to everyone
All in all a great event!

Tuesday, March 5, 2013

In Case You Missed It #1

I'm going to try something new and see how this works. I read a LOT of Cloud Computing news.  When I was speaking on a panel recently I was asked afterwards why I don't share a lot of the news I find interesting and thought provoking.  Great question and here is an initial attempt to do just that.

Below is a list of articles I found interesting over the last two weeks and some commentary on what I see going on in the industry.  I'm still not 100% on the format so let me know what you like and want to see changed.

Events & Misc. Links


Amazon News
Amazon continues to steam ahead but the last few announcements have been very interesting.  In their quest to add more value (and lock-in) to their ecosystem, a bunch of small companies with products built around their cloud were put on notice.  How does a small startup compete with AWS when they decide to move into that space?  Time will tell...

Open Compute
One big OpenStack story to focus on from yesterday, IBM going "all in" with OpenStack.  I saw this one coming a mile away.  Even though I'm now employed by one of the vendors I posted about I still contend that it depends on which vendors show up to the OpenStack Party. As an outsider looking in it appears HP is "phoning it in" (and a lot of people are leaving), while IBM and RedHat are getting serious.


VMware
Beating up VMware has become the cool thing to do.  I joked about it on Twitter but I believe the VMware's message from PEX (VMware Partner Exchange) last week sent the wrong message the same way I felt AWS sent out some bad mojo at their conference late last year.  The big guys tend to approach this as all or nothing and everyone else is the enemy (it's their job, don't blame them) but most customers I talk to don't see it this way at all.


Wednesday, February 27, 2013

ApacheCon LiveBlog: Software Defined Networking (SDN) in CloudStack


This is a live blog from ApacheCon that I'm attending this week.  This session is with Chiradeep Vittal.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.



  • Introduction is about how does Amazon built a cloud (see his previous session for this part)
  • SDN Definition - Separation of Control Plane from the hardware performing the forwarding hardware - Also centralized control
  • Central control eases configuration, troubleshooting, maintain over time
  • Eliminates the tedious "log into every box" idea of network maintenance, log into controller
  • OpenFlow is that SDN? - NO, it is a protocol for the control plane to talk to the forwarding elements
  • Control is on the "top" and forwarding is on the "bottom"
  • flexibility example, different route based on direction. Box A and Box B, different flow from A to B and B to A if needed
  • IaaS and SDN go hand in hand - Agility, API configuration, Scalability,  Elasticity (all the ity's!)
  • SDN enables virtual networking - the illusion of isolated networks on a physical wire
  • SDN does have issues - Discovery of virtual addresses -> physical address mapping for instance
  • He is now going over a multi-tenant topology example:

  • CloudStack model - map virtual networks to physical network - define and provision networks and manage elasticity and scale
  • CloudStack Network Model is very robust (see pic, too much to type, things in box tend to be SDN functions)
  • How de we put this together?
  • CloudStack Service Catalog - Cloud users don't see the "guts" of the configuration, the cloud admin or operator designs the service catalog and presents this to the users
    • example - Gold Network - LB + FW + VPN using virtual appliances
    • Platinum - LB + FW + VPN but using hardware devices
  • Now going over topology example of the Gold offering & Platinum (uses Juniper firewall and Netscaler to Load Balance:
  • In both examples the users has no idea if they are on the Gold or Platinum network
  • Multi-Tier virtual networking - can define application tiers and isolate based on need as well, who is connected where
  • Orchestration - He went through the Multi-Tier example and demonstrated all the steps that would have to be down manually (too many to list) and this will all be done through orchestration
  • CloudStack Orchestration Architecture (see picture) - plugin Framework allows this to happen
  • SDN works with CloudStack through the plugin model, the SDN controller talks to the plugin, today there is integration with Nicira NVP, BigSwitch, Midokura, and CloudStack Native (requires XenServer)
  • CloudStack Native Controller uses GRE and and talks to Open vSwitch on the XenServer
  • All isolation happens through the concept of a tenant key over the GRE tunnels. Each tenant has a unique key
  • What makes the CloudStack controller different? 
    • It is purpose built for IaaS and is not a general purpose SDN solution
    • Proactive model - Deny all flows except ones programmed by the end-user API - others send to central controller and may have problems at scale
    • Use the CloudStack virtual router to provide L3-L7 services (mainly because most hardware doesn't understand GRE today)

ApacheCon LiveBlog: Powering CloudStack w/ Ceph RBD


This is a live blog from ApacheCon that I'm attending this week.  This session is with Patrick McGarry.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.

(No title slide picture this time - missed it)

  • What is Ceph - storage that does object, block, and file all in one; block is thin provision, snapshots, cloning - object has REST API
  • RADOS (Google it) object store at the lowest level
  • Why Object at the lowest level - more useful than blocks, single namespace, scales better, simple API, workload is easily parallel
  • Because of this: define a pool (1 to 100's), independent namespaces and object collections
  • (Topic change) - Architecture
  • aggregate a bunch of different machines so that you can have a "large enough" front end to handle large number of requests in
  • In this "pile" you will have monitors. Monitors provide consensus for decisions, always an odd number, do not store data (traffic controllers) to the storage nodes (OSD nodes)
  • On an OSD node -> physical disk -> file system -> OSD layer
  • CRUSH - pseudo-random placement algorithm for data placement, CEPH "secret sauce", allows for stable mapping and uniform distribution with additional ruled configuration (can apply weights, topology rules)
  • How does it work, take an object, talk to monitors, CRUSH breaks it up, places it around according to the rules
  • What happens when something breaks? If an OSD node is lost, the ones with the copy of the data replicates the blocks somewhere else according to CRUSH rules and moves on
  • How to talk to it? LIBRADOS - library for RADOS, support for C, C++, Java, Python, Ruby, PHP
  • Also RADOSGW - Rest gateway compatible with S3 & Swift
  • CEPH FS - A POSIX-compliant distributed file system with a Linux kernel
  • RBD - reliable and fully-distributed block device sitting on top of the object store
  • RADOS Block Device (RBD) - storage of disk images in RADOS, allows decouple of VM from the host, images stripped across the pool, snapshots, copy-on-write clones
  • What does this look like? vm's are now split across the cluster, great for large capacity as well as high I/O instances of vm's
  • same model as Amazon EBS
  • it is a shared environment, so you can migrate running instances across cluster
  • Copy-On-Write Cloning (he gets lots of question on this) - think of a Golden Image Master vm and you want 100 copies - You spin the 100 instantly and it takes up additional storage as needed and the vm's grow.
  • Question: Is there a performance impact to this? A: No, but as usual it depends on the architecture (how many devices are hitting it)
  • CloudStack 4.0 and RBD? via KVM, no Xen or VMW support today
  • Live migrations are supported
  • No snapshots yet
  • NFS still required for system vm's
  • Can be added easily as RBD Primary storage in CloudStack
  • snapshot and backup support should be coming in version 4.2, cloning is coming, support for secondary storage in 4.2 (backup storage is coing in 4.2)



ApacheCon LiveBlog: DevCloud - A CloudStack SandBox



This is a live blog from ApacheCon that I'm attending this week.  This session is with Sebastien Goasguen.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.

  • Today's talk will focus on DevCloud and CloudMonkey
  • Sebastien giving overview of IaaS market in general. He was actually an OpenNebula guy prior to CloudStack
  • With IaaS setting up a virtual sand box can be tricky since there are a larger number of moving parts: hypervisors, storage, networking
  • DevOps - quick introduction to DevOps to help everyone understand why this is such a big movement in the industry right now (bringing development coser to the operations)
  • This helps us set up an environment to enable a software defined datacenter that allows for automation at all levels
  • Now talking about ASF (Apache Software Foundation) and CloudStack. He has a LOT of analysis around the community. The growth once joining as an incubation project shows a HUGE spike (CloudStack is now the #1 Apache project when it comes to commits)
  • On to the internals of CloudStack, goal is to be as agnostic as possible (multi-hypervisor, both block and object storage) 
  • Network tends to be the most challenging for new folks (firewall, load balancing, basic networking vs. advanced networking, VPN, etc.) - See the bottom line on the picture above
  • Apache 4.0 was released in November, 4.01 was just released, 4.1 set for March. Goal is new release every six months
  • Architecture -> Zone(datacenter) -> Pods(rack) -> Cluster (hosts) -> primary & secondary storage -> Instance (virtual machines)
  • Centralized management server - can be multiple management servers behind a load balancer and replicated MySQL for large scale
  • system vm's are used to communicate from the management server to some features (firewall, secondary storage, etc.)
  • (Topic change) - What is DevCloud - CloudStack in a box, aimed at developers but can be a local EC2/S3 "cloud in a box"
  • self contained - cloudstack management server, ttyllinux (to stay small), system vm's, MySQL, interface all on one laptop - on a beefy laptop expect a good number of instances
  • What is CloudMonkey - cloudstack CLI - great for auto-completion of features, tabular output, help, scriptable, shell interaction, etc.
  • Intro - Launch CloudMonkey, you now have a shell to talk to your cloud, need to do a key exchange, then ready to access your devCloud instance
  • Demo Time - He is running VirtualBox on a Mac Book Air, he is using a NAT interface, forwarding a few ports needed (8080, 2222, 8443, 5901, 7080) - The vm uses nested vm's to launch inside the virtual machine on the laptop
  • 2nd Demo - He is running the 4.01 release on his laptop directly from the sourcecode instead of the devcloud vm as well.
  • Back to DevCloud - He shows the system vm's up and running and an instance that is halted.
  • Went into Web UI - Gave an overview of the Infrastructure, you will have a zone and pod that is defined (named devlcoud), from there a single host as well
  • Secondary storage - NFS storage is built in and emulated, primary storage is "local". No need to stand up an external NFS service
  • templates - the system vm's and the small linux template are already included.
  • Sebastien went through creation of a new instance using the included tiny instance and shows everything spinning up.
  • You can take snapshots (saves to secondary storage)
  • The first time a template is used it is pulled from secondary storage and copied down to primary storage
  • Global Settings - EC2 API feature turn on if you want to run EC2 commands against it
  • Now going over CloudMonkey features
  • First thing, set the API key (get this from the UI)
  • Now you can do common tasks (list virtual machines, start/stop virtual machines, etc.)
  • Another way to use DevCloud: different network type, 2 vNICS, one host only and one NAT
  • Build it from source (need Maven dependencies), deploy the database, basically build it yourself. Because you build it this way, there are no zones, pods, etc.  You build everything yourself.
  • One thing you can do with this is build your entire infrastructure from scripts. This allows you to test build process of CloudStack for replication.  This is a very powerful use case!

Really great presentation and great overview to those new to CloudStack and DevCloud!





Tuesday, February 26, 2013

ApacheCon LiveBlog: Object Storage with CloudStack & Hadoop


This is a live blog from ApacheCon that I'm attending this week.  This session is with Chiradeep Vittal.

Usual Live Blog Disclaimer: This is more of a brain dump typing as fast as I can, please excuse typos, format, and coherent thought process in general.


  • How does Amazon build a cloud:
    • Commodity Hardware -> OpenSource Xen Server -> AWS Orchestration Software -> AWS API -> Amazon eCommerce Platform
    • How would YOU build the same cloud on CloudStack - You can in much the same way: Hardware -> Hypervisor -> CloudStack -> API -> Customer Solution
  • CloudStack is built in the concept of a Zone (much like an AWS Zone)
    • Under the zone is a logical unit of Pods (think of it as a rack)
  • Secondary Storage is used for Templates, snapshots, etc. (items that are storage and not changed often, need to be shared across pods)
  • Cloud Style Workloads = low cost, standardized hardware, highly automated & efficient (it's the Pets vs. Cattle analogy)
  • At scale, everything breaks eventually
  • Regions and Zones - Region "West", hope a Region will not go down when another Region goes down. - Replication from one Region to another Region is the norm
  • Secondary Storage in CloudStack 4.0 today
    • NFS is the server default - mounted by any CloudStack Hypervisor, easy to set up
    • BUT - doesn't scale well, "chatty", maybe need WAN optimize. What if 1000 hypervisors talk to one NFS share?
    • At large scale NFS shows some issues
    • One solution is use object storage for secondary storage
  • Object Storage has redundancy, replication, auditing built in to the technology typically
  • In addition, this technology enables other applications, API server in front of the object store and you know have "Dropbox", etc.  typically static content and archival kinds of applications
  • Object is 99.9 availability and 99.(eleven 9's) durability according to Amazon S3 and Massive scale (1.3 trillion objects in AWS today serving 800k requests per second
  • Scalable objects can not be modified, only deleted (called an Immutable object)
  • Simple API with a flat namespace - think KISS princisple
  • CloudStack S3 API Server - understands Amazon S3 API with a Pluggable BackEnd, default backend is a POSIX filesystem (not very useful in production), Carringo was mentioned as a replacement, also HDFS replacement
  • Question - Does CloudStack handle all the ACL's / Answer: Yes
  • FollowUp - Does that mean SQL Server is a possible constraint / Answer: Yes
  • Integrations are available with Riak CS and OpenStack Swift
  • Upcoming in CloudStack 4.2 - Framework to expand this much more
  • Given all of this, what could we build? (Topic switch)
  • Want an Open Source, scales to 1 billion objects, reliability & durability on par with S3, S3 API
  • This is now a theoretical design (hasn't been tested)
  • (See picture for architecture)

  • Hadoop meets all of these requirements and is proven to work (200 million objects in 1 cluster, 100PB in 1 cluster), need to scale, just add a node, very easy
  • BUT - Name Node Scalability (at 100's of millions of blocks, could run into GC issues), Name Node is a SPOF (Single Point of Failure) - this is being worked currently, Cross Zone Replication (Hadoop has rack awareness, what if further apart?) - this isn't really tested today, where do you store metadata (ACL's for instance)
  • take a 1 billion objects example (bunch of assumptions here) - needs about 450GB per name node, 16TB / note = 1000 data nodes
  • Name Node management is federated (sorry this is vague, getting beyond my knowledge of Hadoop architecture at this point). Name Node and HA really hasn't been tested to date
  • NameSpace shards, how do you shard them? Do you need a DB just to store this?? What about rebalancing between node names?
  • Replication over lossy/slower links (solution really breaks down here today)
    • Async replication - how do you handle master/slave relationships?
    • Sync - not very feasible if you lose a zone (writes never acknowledged so will not continue)
  • Where do you store Metadata?
    • Store in HDFS along with the object, reads become expensive and meta data is mutable (needs to be edited), needs a layer on top of HDFS
    • Use another storage system (like HBase) - required for Name node federation anyway, but ANOTHER system to manage
    • Modify the Name Node to store the metadata
      • high performance (doesn't exist today)
      • not extensible and not easy to just "plug in"
  • What can you do with Object Store in HDFS today?
    • Viable for small size deployments - up too 100-200 million objects (Facebook does this) with datacenters close together
    • Larger deployments needs development and there is really no effort around this today