Self-service creation and deployment of on-demand macOS CI environments for remote developers

The following is an interview with Filip Viskovic of Improbable on how they used Anka to set up self-service creation and deployment of on-demand macOS CI environments for 100s of remote developers. Improbable is a company dedicated to providing services for extraordinary multiplayer games. They have been users of Anka since 2019.


Before you found Anka, do you remember what sort of problems you were experiencing with your build/test/ci/cd tooling and setup?

Improbable’s previous build system(s) and its problems

Our CI systems started as a collection of bare-metal and VM environments enrolled into an agent-based configuration management system. These environments were shared by all developers at the company with little in the way of access control and a lack of clear ownership between a CI agent and a team. Over time the number of fixes not captured in code prevented us from confidently rebuilding these environments. This became an operational headache as we scaled our technical needs. Our developers saw an increase in their CI outages which hindered them from working on their products.

It became obvious that this solution was not fit for a company with a rapidly growing developer population. As such, my team was formed to tackle our company-wide CI pain points. For Windows and Linux, we designed a set of systems that allow our end-users to create, deploy, and maintain their CI in the cloud. This is done by using ephemeral VM instances which we scale according to demand. The VM images are built via a pipeline that uses Packer and Ansible to capture a golden image. Our end users then use the id of this image to define in code where they want their pipeline to execute. This has been a great solution for us, as it allows our team of approximately 6 to provide a self-service CI platform to the 100s of our developers across the globe without the need to embed build engineers in each team. You can read more about our design goals and final implementation in our company blog post.

We maintained the bare-metal style of management for our Mac agents until we were able to port all builds happening on the other two operating systems onto our new platform. For macOS support, we wanted to implement a similar solution, as close to our ideal as possible. 

In short, we wanted to solve the following pain points:

  • We had heaps of bare-metal agents, which we charitably described as inconsistent. Even though we had agent-based configuration management in place, it was not enough to ensure build environment consistency without additional complexity. It was also possible for configuration management to change the state of the agent while a build was running.
  • Upgrading the OS in place of a bare-metal machine often causes the build environment to break in fun and mysterious ways. Xcode upgrades also fall under this category.
  • Making broad changes to the agent baseline behaviour by our team (git or perforce configuration, monitoring and logging, security, etc) required manual steps for testing across all of our supported operating systems.
  • We had a strong requirement to be able to reproduce historical builds and the CI environments used to build them. 
  • Mac management includes some additional tasks when building them into CI agents. Building a machine from scratch requires manually running the OS installer and completing the out-of-box setup process. Automation for these steps and other security authorisations, such as approval of system extensions, could only be achieved by using MDM workflows.
  • The testing and building of new agents by engineers were bottlenecked by our ability to assign Ops capacity to wrangle hardware. During the global pandemic, this stressed our internal processes.
  • On the flip side, we have several CI jobs that can only be run on Improbable owned and managed hardware. Our public certifications and other sensitive requirements prevent us from using the cloud for these jobs.
  • Upgrading the OS in place of a bare-metal machine often causes the build environment to break in fun and mysterious ways. This runs against the grain with our need to run the latest version of macOS, with a delay up to ~6 months since the version release. Again, our security compliance sets hard requirements on our developer environments.

Do you remember what originally drew your attention to Anka compared to other solutions?

Choosing Anka

So to help us achieve reproducible builds for macOS, we tested Anka. We spiked it for a couple of weeks and were able to implement an effective local workflow for testing Mac CI environments. It satisfied our initial requirements and provided some additional benefits:

  • All the benefits of running workloads inside of an immutable VM along with Docker-like behaviour and layered images. 
  • By moving the builds to happen inside of a VM, the host’s only responsibility is running Anka. These thinly-provisioned hosts are simpler to maintain and could be built from any Mac hardware. We have the option of scaling onto the cloud if needed in the future.
  • Anka has a Packer builder. Since we’re already using Packer for building Linux and Windows VMs, this means less bespoke code for us to maintain.
  • Veertu also provides an image registry for the storage and distribution of VMs. We’re able to easily run this in our Kubernetes cluster with a persistent volume claim.

While testing Anka, what sort of surprises did you experience?

Results

We took our on-premise Mac Minis and turned them into Anka-VM-nodes. The agreement between our team and our end users is that we are responsible for the availability of these nodes, while our end users are only concerned with the VM environments in which their CI jobs run. For building new VM images, we’ve integrated Anka into our pre-existing Packer/Ansible implementation, therefore our end users can use the existing pipelines and processes they are accustomed to. Since there are limitations to running macOS as true ephemeral machines, we’ve opted to keep our hosting on-premise for now.

We carefully re-architected our environments to avoid common mistakes and provided these benefits into our base VM layer which we call the OS layer. No credentials or other secrets are ever baked into a VM and are instead dynamically injected by the Anka-node hosts. We try to remove or disable anything that may change the state of the VM when it’s in use. The VMs do not run with admin privileges. By using this “batteries-included” approach, we’ve also reduced the amount of context required by our end users to use macOS.

Where is Improbable today

  • For the first time, it’s possible to remotely self-service the creation and deployment of macOS CI environments by our users. Previously, the iteration loop for creating new Mac CI machines from scratch could take up to a day, and many iterations may be required to build into a reliable agent. With Anka, we’ve been able to reduce this loop to ~25 minutes. The deployment of Anka images onto our nodes takes a few minutes and is further sped up by locally caching the OS layer of the image.
  • The separation of the Anka VM and the Anka-node allows us to perform OS upgrades and testing separately from each other. Our end users get the choice of upgrading when they can, while our team can maintain security compliance on the nodes without drastically affecting capacity.
  • We’ve been able to automate the manual testing of agent baseline behaviour by including macOS environment building within our matrix-build pipeline, reducing our manual testing needs by 10s of hours each quarter.
  • Flakey CI environments can be almost instantly rolled back by restarting the VM from a specific image tag. The benefit of this feature is hard to estimate but it eliminates many categories of debugging costs from our users.
  • If needed, we’re able to double our capacity of macOS CI agents by running two VMs on a single node.
  • Anka provides useful workarounds for automating the out-of-box experience and approving of systems extensions. We no longer need to maintain an MDM implementation for our build environments. The IT and security teams, who maintain our company MDM services, are no longer on the critical path for making changes in our environments.

We’re currently porting all of our bare-metal environments into Anka and seeing some great results so far. I hope I was able to give you some insight into what we’ve built and the impact that we’ve seen.

Share this post

networking-performancev1
Unlocking Superior macOS VM Network Performance: Introducing Anka's new networking mode for Apple Silicon
Large and complex enterprises using Anka have many different demands, and we have worked to continue to develop innovative technology to meet these demands. Enterprise infrastructure hardware is often on the cutting edge, and they need advanced capabilities...
Read More
gitlab-with-anka
Anka Cloud Gitlab Executor
Veertu’s Anka and the new Anka Cloud Gitlab Executor Veertu’s Anka is a suite of software tools built on the macOS virtualization platform. It enables the execution of single or multi-use macOS virtual machines (VMs) in a manner similar to Docker....
Read More
mac-scan-v1
Real-Time CVE Scanning of your macOS Build Systems
It’s common that an organization’s macOS build system will download thousands, sometimes tens of thousands of third-party dependencies every hour. When building and testing iOS applications, it typically downloads and installs third-party...
Read More
anka-on-silicon-v1
The ONLY Fully Automated Apple Silicon macOS VM Creation Solution
Starting in Anka 3.1 we announced that Anka is now able to fully automate the macOS installation processes, disabling SIP, and enabling VNC — all previously manual steps users had to perform inside o the VM. At the time of writing this article,...
Read More
anka_click
Scripting macOS UI User Actions With Anka Click
Starting in Anka 3.2, we’ve introduced a solution for scripting macOS UI user actions. You may ask, “Why would I want to do that?”. Well, often macOS configuration and applications do not have a CLI allowing you to perform certain actions...
Read More
mac-scan-fullscan-shells-v3
Real-time, continuous scan of file downloads on macOS for security vulnerabilities
Today, we are announcing the Beta availability of the Mac Scan solution. Mac Scan software runs on macOS systems (bare metal, virtual, EC2 Mac) and scans downloads in real time for security vulnerabilities. There are multiple scenarios why you would...
Read More
Screen Shot 2022-10-17 at 10.13
Anka 3.1- Fully automated VM macOS installation & The Behavior-Driven macOS UI Automation Framework
We are very happy to announce the General Availability of Anka 3.1 for Apple Silicon / ARM macs. In this release, we are taking our approach to iOS CI automation one step further by introducing a Behavior-Driven macOS UI Automation Framework in Anka,...
Read More
ankam1v2
Migrating from Anka on Intel to Anka on M1 Mac for iOS CI
In this blog, we will cover the key topics for migrating from Anka on Intel to Anka on M1/M2 Macs. Anka is an IaaC solution from Veertu to set up an agile Container like CI for iOS CI using macOS VMs. Anka for Intel uses Apple’s Hypervisor.Framework virtualization...
Read More
ami-scanner-featured-v2
World's first Security Vulnerability scanner for EC2 Mac AMIs
We are excited to announce the General Availability of the world’s first security vulnerability scanner for EC2 Mac AMIs. EC2 Mac AMI Scan scans Intel and Apple Silicon macOS EC2 AMIs, detects security vulnerabilities in third-party packages, dependencies,...
Read More
esxi-to-anka
It's time to migrate your iOS CI from ESXi Virtual Mac Infrastructure to native macOS Virtualization
When VMWare ESXi started officially supporting Apple macOS Virtualization on Mac hardware in late 2012, it opened the doors for the possibility of iOS development to move to a Linux-like, agile, scalable CI infrastructure. Soon enough, many iOS enterprise...
Read More