This is the second of two posts about the migration from our legacy infrastructure to xCloud on Terremark. You can find the first post here. ###Step 4: VM Creation After the archives for a complete grouping of VMs reached the Terremark migration server, we could create VMs.
When creating VMs at Terremark, we normally use their vCloud-compatible API. It supports creating from pre-defined templates and cloning existing VMs. Neither creating nor cloning were good options for us because we had a specific root filesystem. What we really wanted was to boot a “bare” VM with blank disks, lay down the root filesystem and the contents of other archives, and then run a VM-specific configuration script.
The Terremark management UI allows creation of blank VMs. We noticed that such VMs tried to PXE boot. PXE booting sounded like the ideal route because that’s how our legacy Xen hosts booted normally. Unfortunately, the API didn’t support creating blank VMs. And even if it did, we would have had to maintain necessary PXE infrastructure in many places due to other aspects of how the API works.
We worked with Terremark professional services to create a custom process that would let us create blank VMs where they needed to be and maintain only one PXE setup. We needed professional services for these items:
- Creating VMs with a specific name, CPU, memory, and disk configuration on the same layer 2 network segment as the PXE setup running on the migration server.
- Provisioning VMs by moving configured VMs to their final home including: VMware VDC; internal network; and hooking up the VMs in the Terremark management UI so they could be managed through the UI and API.
For each of these items we needed to track progress, so we added endpoints to the migration service that Terremark professional services could interact with. The migration service tracked each VM’s state and other details such as name, eventual VDC, and network. As VMs moved to the “needs creating” or “needs provisioning” states, the corresponding endpoints listed the appropriate VMs and related details. Terremark professional services wrote a program to poll these endpoints and create or provision the listed VMs.
The migration service assigned a “configuration mode” MAC address and IP to each VM. This information was provided to Terremark via the “needs creating” endpoint. As part of the PXE infrastructure, we ran DHCP with a configuration generated from the MAC to IP mappings maintained by the migration service. This allowed us to link requests from an IP to the to-be-configured VM.
When PXE booting, the VMs needed something to boot. Our legacy Xen hosts booted via PXE. For that, we modified this script. Further adapting it to suit our needs was easy. You can see the script we used here; what the VM actually ran when it booted starts at line 103.
This script resulted in a fully-functioning in-memory system with busybox and all it provides: ssh access, LVM utilities, and other things to configure the system. At lines 158 and 159, it fetches the VM-specific configuration script from the migration service (which conveniently was also the DHCP server) and sources it. ###Step 5: VM Configuration The fetched and sourced script came from this mustache template. This step summarizes the interesting parts, with the database-related parts in Step 6.
The script kept the migration service updated during the script’s execution with calls to the state_update and message_update functions. This let the migration service UI show these state changes, log messages, and hold up other VMs from progressing if one in their group has erred. The script ran with the errexit shell option. This meant that if any command exited with a non-0 status and wasn’t checked by an if or similar statement, the script exited. The trap at line 2 told the migration service an error had occurred.
To start, the script mounted a couple NFS shares at the migration service’s IP. These shares contained the filesystem archives for VMs that were being migrated and support data, such as a new kernel and modules.
The script partitioned the first disk, creating a small partition for /boot and another encompassing the rest of the disk. Because they used LVM, any disks after the first had their partition tables zeroed. The second partition on the first disk and the whole of the other disks became physical volumes (PVs). A single volume group (VG) was set up to contain all the PVs named local. Using LVM on the migrated VMs was necessary to retain snapshot ability.
Logical volumes (LVs) were created matching the filesystems data from the migration document. The LVs were formatted. For /boot, ext3 was used. For all other non-swap filesystems, we chose xfs. We were using reiserfs on our legacy infrastructure but wanted to move to something more supported. Xfs allows for online growing even for the root filesystem and is supported by current distributions. After formatting, the filesystems were mounted with hierarchy under /newroot.
For new, non-migrating VMs that had no archived root filesystem, our custom Gentoo distribution stage4 was used instead of a root archive. Each VM archive was unpacked under /newroot/source_item where source_item was the original path, such as /data. This produced a system identical to the snapshots and GFSes previously archived.
For database VMs, the archived database dumps were copied to /newroot/database_data_dir, where database_data_dir was usually /db. They were left compressed because the database setup part of the script handled them.
The next sections of the script dealt with getting the VM ready to operate in its new Terremark environment.
VMs running on our legacy infrastructure didn’t require a kernel or boot loader inside the VM. VMs on Terremark run under VMWare, which is fully virtualized, and requires a kernel inside the VM to boot. When we realized that we needed to change the kernel, we considered various options including running recent vanilla kernels from kernel.org, but ultimately decided to use the CentOS 5.4 kernel.
These kernels were unpacked and made available to the VMs via an NFS share. The script started by copying the kernel, source code, and modules into their appropriate locations. Our Terremark VMs run a CentOS kernel and our custom Gentoo userland.
We used a custom initrd to boot VMs. The mini system that was PXE booted to run the migration script was really a highly functional initrd. New VMs were booted with something slimmer, a modified LVM initrd script that included loading the disk and LVM modules and mounting the root filesystem.
GRUB was installed as the boot loader on the new VMs. It booted the CentOS kernel using the custom initrd, telling the initrd to find the root filesystem at /dev/local/root.
After that, some higher-level configuration was done:
- A file that let us and tools see if the VM in the “migrating” phase is touched
- /etc/fstab was written
- Puppet, the configuration management tool used on xCloud, was set up to run via init.
- The new network configuration was written to /etc/conf.d/net, /etc/resolv.conf, and other related files.
- The contents of /etc/conf.d/local.start were created.
- The contents of /firstboot.sh were created.
Near the bottom of the script, some services were disabled while others were enabled. The most notable disabled service is vixie-cron. During the testing and verification phase, we didn’t want any scheduled jobs firing and doing things like queuing up emails to send or transactions to process. We protected against things like this with iptables, described below.
Beyond that and database-related preparation, the major items in the configuration script were setting up NFS and iptables.
Setting up of NFS closed the loop on what to do when we could no longer use GFS. It works most like GFS but without the shared block device requirement. The migration service determined which VM to make the NFS server when GFS archives existed for a migrating environment. On those VMs, /etc/exports was written with necessary configuration to export the NFS shares to other VMs owned by the customer. For VMs that would become NFS clients, entries for the NFS mountpoints were added to /etc/fstab. The end result was the contents of the previous GFSes were made available at the same paths they were on our legacy infrastructure.
Iptable rules were set up by the configuration script to satisfy the last requirement listed in Part I. Using the configured iptable rules, we ensured that new VMs would not communicate outside their internal network. This was very important for customers that bill credit cards or contact outside services. We didn’t want migrating VMs to hit any production services on the Internet before the migration was complete.
When complete, the script reported completion to the migration service. This made the VM available for provisioning by the program written by Terremark professional services. ###Step 6: VM Provisioning Terremark shutdown any VMs listed at the “needs provisioning” endpoint of the migration service; moved them to their final, customer-specific networks; and booted them.
When a Gentoo VM boots, /etc/conf.d/local.start is run at the end of the boot order. For non-database VMs, this script checked back in with the migration service a final time to say “all done.” For database VMs, it loaded the customer’s database dumps and established replication before reporting in.
Much of the first boot scripts deal with MySQL. Because most of our customer databases use MySQL, it was the most automated part of the migration process. PostgreSQL was handled with scripts outside the process and later by Puppet.
For MySQL, the process started with creating a config for importing data. It’s nothing special and not necessarily tuned for the VM’s makeup but is good enough to start with. Puppet came around later to tailor the config to the VM.
Then, the bulk of the work was between lines 543 and 568. Most important to the migration process was the data load and replication configuration. Database archives that had been copied to database VMs were decompressed with gunzip and piped into mysql -B. The data load was done simultaneously on the master and replica and used set sql_log_bin=0 to prevent the master from writing binary logs because replication was setup later.
After loading the data, the master was done. The replica used the hold_until_complete function to query the migration service for the status of its master. After the master reported as “complete,” replicas knew the master was done and used the establish_mysql_replication function to set up replication. ###Step 7: Testing and Verification After the VMs were configured and provisioned, it was time to test the new setup.
The migration team had a collection of tools for fixing up common application-related configuration changes, such as hostnames. Once complete, an initial test was as simple as accessing the site at the new IP. The migration team created a simple gem, eymigrate, to help with customer testing.
During this phase, the migration team and customer were comfortable doing anything that might impact shared assets or database data because they would be reset during cutover. With the iptable rules in place to prevent general outbound access from the VMs, we ensured customers that no external production services could be contacted by the being-verified VMs.
When the customer was satisfied with their new setup, we proceeded to the cutover. ###Step 8: Cutover Before that there were a few things to take care of:
- Because public IPs were changing, we worked with our customers to lower DNS TTLs. We did this directly for customers whose domains we hosted.
- Rsync was used to “catch-up” the customers shared assets; syncing from the still-live VMs on our legacy infrastructure to the new VMs. This reduced the downtime for the final sync during the cutover. We shortened the sync times by excluding directories containing transient session files or files only used for local processing.
- To cutover databases, we had two options: complete-dump-and-restore or cutover-via-replication. For smaller databases, dump-and-restore was simple and worked fine. Larger databases required converting the new master database VM into a replica of the current master on our legacy infrastructure. Using ssh tunnels between the VMs, our DBAs established replication and got the new master (acting as a replica) in sync so the cutover just involved breaking the replication link and converting the new master back to a master.
- The migration team and customer agreed on the date and time to perform the cutover.
At cutover time:
With that, traffic was live on the new setup. Downtime with a maintenance page was generally less than 10 minutes.
My team, including [Edward Muller](http://twitter.com/freeformz) and [Lee Jensen](http://twitter.com/outerim) (now with Big Cartel), worked on the migration service and customer UI, but the migrations would not have been possible without our migration team.
They worked tirelessly with customers to get migrations done while meeting and exceeding the requirements listed in the first post. So, to [Kevin Rutten](http://twitter.com/krutten), [Matt Dolian](http://twitter.com/mdolian), [Will Jessop](http://twitter.com/will_j) (now with 37signals), [Taylor Weibley](http://twitter.com/themcgruff) (also now with 37signals), [Matt Reider](http://twitter.com/mreider) and [Daniel Vu](http://twitter.com/danvu_4444), I say thank you for all the work you did for our customers and Engine Yard.
- A maintenance page was posted and the application running on our legacy infrastructure was shut down.
- The database procedure was followed.
- The application was started on the new VMs.
- Necessary DNS changes were made and to complement this we used [iptables](http://en.wikipedia.org/wiki/Iptables) to create rules to forward traffic from each of the customer's public IPs on our infrastructure to the corresponding public IP at Terremark.
- Removed the “migrating” file.
- Ran Puppet, to start things like cron.
- Removed the iptable rules limiting outbound access.