For the second time in two weeks, and the 2nd time in 13 months, Engine Yard experienced upstream routing issues today that impaired service.
The issue was shorter today, but still unfortunate.
We apologize for this issue, and the previous one as well.
The last event came and went without a reasonable response from the upstream provider, and that lack of response is now going to cost them our data center, and by extension Engine Yard and our customers as customers! There's a lesson we'll keep in mind as to the importance of how quickly, thoroughly and transparently we need to report on operational problems on our end.
One thing that is very sad about this story is that this bandwidth provider is Verizon. Verizon bought UUNet which had previously bought alter.net (I've possibly reversed UUNet and alter.net in this description, Google couldn't enlighten me in 5 seconds, so I gave up). UUNet has a very long and illustrious position as a bandwidth provider, and certainly served us well in our first year! We'll miss UUNet on a philosophical level. :-(
Please note that we're very satisfied with the service, notifications, and attention we receive from our data center. It's their supplier that is at issue here, and it's unreasonable to expect them to replace a long term provider on the basis of a single outage.
Here's a formal statement from our data center and bandwidth provider, Herakles Data
Subject: FW: Network Service Disruption You Experienced
Date: Mon, 19 Nov 2007 14:42:29 -0800
From: “Darren E. Canady”
To: “Tom Mornini”
Please be advised that the latest word from our upstream provider of the alter.net network is that they had a “card crash” on one of their core (edge) routers which impacted service to the Sacramento area.
The issue has not fully been resolved by the carrier, but Herakles has taken action to advertise to the world that the alter.net network is a “highly undesirable” path for traffic to flow across to access the Herakles IP space.
Further, we've increased our bandwidth with one of our other upstream providers to handle the overflow of traffic from alter.net. This is in place now.
Additionally, we already have a service order being processed to obtain bandwidth from another provider, to replace the alter.net connection. We're not sure what the timeline is for implementation of this service, however, the provider has been made aware that we need this service implemented ASAP and we'll have it up as soon as it is available from the carrier.
We apologize once again for the inconvenience and REALLY APPRECIATE your support in providing traces and contacting us in order to give us something substantial to provide to the carrier to document the event.
Please contact us IMMEDIATELY should the issue re-occur. We're reluctant to shut the alter.net interface down completely as we do want to keep as much redundancy in place as possible. However, be advised that doing so does still cause advertisements to occur that specify the alter.net path is a valid/viable route, even though it's been flagged as less desirable by Herakles. If it does become a significant problem, even after the actions taken, we will shut the interface down to mitigate service disruption while we await either the new service or assurance from alter.net that it's safe to bring the port back up, whichever occurs first.
Thanks and Regards,
Darren E. Canady