StackExpress managed a Docker-based public learning cloud platform for one of its client. For performance, and cost objectives it maintained its own private Docker registry. Later, Docker released a better but backward-incompatible v2 registry. To access the improvements in new Docker registry, and for future compatibility, this learning platform required migration from current (v1) Docker registry to the newer (v2) Docker registry.
For any regular Docker-based application stack, with a few application images, a registry migration would not have been an issue. However, in this case, we had many tens of thousands of images to migrate. From planning to execution, the team completed the migration in a week's time. We also reported multiple bugs encountered along the way to help improve the upstream registry migration tool.
Optimize AWS costs
Spot instances, and ClusterK management
Moving existing static EC2 infrastructure to Cloud formation based On-demand/Auto-scaling infra, with secure IAM policies
Migration existing EC2 infrastructure to VPC
Custom AMI in various AWS regions
Migration from private cloud to EC2, and vice-versa
One of our Client is a startup hosted entirely on AWS. They started with 2 EC2 instances, and now have 20+ EC2 instances along with various other AWS services. The entire deployment on AWS is managed by hand. The client's team was spending most time handling peak overload, instance management and alerts. The CEO wants a reliable partner who can handle infrastructure, while employing the best practices, so that their software engineers can stay focused on making their product better.
Our AWS specialists came up with a parallel infrastructure using Cloudformation, VPC, IAM, and Auto-scaling, making way for a more secure, and scalable infrastructure. After testing, and client review, the existing services were migrated to this new infrastructure over next few weeks. Once StackExpress's migration team handed over the deployment to our DevOps team, we completed the deployment with monitoring, and alerts to make sure the application maintains a healthy SLA.
Manage MongoDB deployments (single container/instance to sharding, clustering)
Manage PostgreSQL databases
Manage MySQL databases
Migrate databases to/from RDS
Client had its MySQL database hosted on an EC2 instance, and wanted to migrate it to AWS RDS MySQL for ease of management, upgrades, and monitoring. Since the database stores confidential personal information, they required encryption for the data at rest. As disaster recovery (DR) measure, client stored MySQL backups on a different cloud service. Client wanted these backups to be stored encrypted. On the application side, they wanted their Ruby stack to be upgraded to newer stable releases, while making sure the application doesn't breaks.
We evaluated the requirement, came up with a MySQL DB migration plan, with relevant DB configuration, which covered the security and other improvements expected by the client. Upon approval, the migration was successfully executed in a short period of time. This was followed by secure implementation of off-site backups protected by multiple keys. Next, we upgraded their Custom AMIs with latest stable Ruby release at the time, and tested well to make sure we didn't break any part of the application in the process. As a Bonus, we updated their Cloudformation to allow for a rolling application upgrade process.
Manage configuration and deployments using Ansible, Chef or Salt-stack
Migrate from existing scripts/Makefiles to deployment managed by Ansible or Chef
Client’s Challenge #1
Client is using Makefiles for his deployments. While the whole process is pretty refined and mostly pain free, the possibility of dynamic changes, along with simpler way of specifying complex changes on multiple targets is pretty limited. Client would like to use something which gives them an easy yet flexible DSL, and be able to generate deployment specific configuration dynamically.
StackExpress has been using Chef for many years now in small to large, and simple to complex deployments. It suited the client requirements well as it provides a platform agnostic (in most cases), easy to use DSL, while still allowing the flexibility of sprinkling Ruby code, allowing for dynamic and flexible expression. Over coming months, alongside other AWS infrastructure improvements, we gradually ported their entire work to Chef.
Client’s Challenge #2
Client has a sudden vacancy for his DevOps role, and his production deployment has a few hundred servers over multiple public clouds. They use certain tools for configuration management and deployment that StackExpress team doesn't hold good expertise. Lack of documentation didn't help the case either.
We completely understand that startups have to be fully focused on their product, even when they miss a few good practices (like documentation here) that are necessary to scale in long term. Our team internally discussed the opportunity, and we took it on. Over next few weeks, we built internal competency to not just serve the role of DevOps, but even assisted in troubleshooting complex business critical issues, which required us to traverse their entire application code-base.
Custom app development in Node.js, Ruby and Python
Technical guidance on architecture, components, and development practices
Client created a cloud learning platform accessible to all for free. Over the time it has become very popular among students, hobbyist, and professional developers. It frequently feature in developer discussions sites(like StackOverflow.com). Like any product-focused company, client wanted reassign their own resources to their primary product, and search for a competent agency to maintain this learning platform with standards at-par with or exceeding those maintained by the client's team.
StackExpress is primarily a DevOps consulting, and operational support services provider. However, we also have an exceptional team of full-stack, backend and frontend developers. Client interviewed our team, and liked our dev workflow. We started on small project for this platform. Today, we independently handle all aspects of this platform, and are frequently adding new features to it.
Setup a staging, QA and dev environment
Monitoring setup. 24x7 monitoring and alert response
ELK / Graylog setup
Managing entire cloud deployments ( AWS,GCE and Digital Ocean )
Assist in mitigating DoS/DDoS attacks
Client has a well managed tech stack deployed on top of AWS EC2, and an existing DevOps services company to manage it for them. They still however need someone to stay on top of their alerts 24x7x365, a gap their current DevOps provider is unable to fulfill.
StackExpress offers 24x7 alert management and response for just this purpose, and most of our clients subscribe for this service. We've evolved from a public cloud, and shared hosting business where availability across all time-zones is critical to fulfill our SLAs. We support on-premise Nagios deployments, to New Relic, and Datadog. Just add us to your Pagerduty or Opsgenie on-call schedule.