How Telehouse America prepared its data centers for the Superstorm - and the lessons it learned

It was exactly one month ago on October 30 that Hurricane Sandy made landfall bringing unprecedented havoc to the eastern seaboard of North America.

A tropical hurricane had turned into a Superstorm. It was European weather modelling which first predicted that Hurricane Sandy would track North and West onto mainland America and not, as normally happens with big weather systems originating in the Caribbean, track North and East before fizzling out over the North Atlantic.

The weather modelling gave firms an opportunity to start putting their contingency plans in place.

For Telehouse America, this meant hitting the phones and arranging some meetings with authorities and landlords.

The company had three sites in the path of the storm. Teleport on Staten Island is a two Mega Watt facility operating at 80% utilization. 25 Broadway, running half a Mega Watt  was 80% utilized but since its closure was announced operations are winding down and it is now running at 55% utilization. 85 10th Avenue is about 40% committed running around one quarter of a Mega Watt of power.

Once it knew what was coming it began preventative maintenance with checking and testing equipment down to the level of hoses and belts. Measures included fuelling tanks to 90% of their capacity. It made sure the chillers and gensets were supplied and it tested for weak links in the battery strings.

“One mistake was planning for 48 hours, next time we’d plan for 96 hours.”

David Kinney, Director, Facility Engineering & Planning, Telehouse America

It knew it would have a 30 minute warning ahead of the power going down.

However what no-one predicted was the scale of the devastation.

David Kinney, Director, Facility Engineering & Planning, Telehouse America says, that one lesson learned was about longevity. “One mistake was planning for 48 hours, next time we’d plan for 96 hours.”

Logistics considerations included working on staff plans to ensure there would be enough people in the right places once the storm struck.

“When natural events occur one of the problems is getting people to and from the data center. Once the transport system goes down we had to ensure we had enough coverage. If guys couldn’t leave we needed to have enough people to ensure people could rotate shifts” says Fred Cannone, Director of Marketing & Sales, Telehouse America.

When it hit

While the company could prepare by speaking with its suppliers and landlords what it couldn’t do is trigger disaster recovery plans for its clients.

Canone, says: “The facilities team were managed from a micro level minute to minute – and we were communicating to clients by email and web updates and drafting notices every two hours and the phones were staffed 24/7. We did have quite a few clients checking in regularly and the ops team fielded many questions from concerned customers. Medium to large customers were preparing to engage their own disaster recovery plans and they needed to communicate those to third parties like ourselves. The most common question was: “What’s the status of the fuel?””

The company admits that it learned some lessons: “We could have done a better job drilling down into more detail and recapping from a customer perspective. That’s now our methodology so that customers can make better decisions on their disaster recovery so that if there is a threat of failure they could lift the load. It is about giving clients ample information in time so they could make decisions on their DR plans,” Canone says.

Naturally where the company found itself busy was with communications. A lot of the telecoms are driven by where the local pops are so if one of the facilities in the area had problems everyone ended up having the same kind of issues. When Verizon’s West Street pop was flooded, it affected everywhere those circuits led, he says.

“One thing we did well was to shift communications load – if they had multiple vendors customers could fail over to different carriers though clearly some people didn’t take advantage of that choice. Our technology guys could quickly run a cross connect to some other carrier. The ability to react or proactively engage this redundancy was great. So we set up connections to multiple ISPs to bring them back to life,” says Canone

For Telehouse the mission historically has been carrier neutral data center facilities around the world. Teleport opened in 1989 and has never had an outage, Canone says. 25 Broadway –  which is closing next year has been around since 1997 and 85 10th Avenue was opened in 2011 as the company is becoming a carrier in its own right. “We provide the space, infrastructure, utility and back-up power and vendor partners – AT&T, Verizon, Level3 – provide the actual telecoms environment. We provide the means to connect. That’s the core business”

Exercising disaster recovery plans

Fred Canone

Teleport’s Fred Canone

Telehouse America stayed up throughout the hurricane for a number of reasons. While luck was undoubtedly a factor, there was also process and planning.

The fuel kept arriving. None of the site’s has power back up equipment below ground. The generators are on the first and fourth floors and on the roof.

And going off the grid is not uncommon for the company.  “In summer we’re part of a load curtailment strategy with the utility. Should there be an extraordinary load placed on the grid they ask us to go off  grid so they maintain the power to the population. In the past summer this happened four or five times. When there is this high demand we transfer and run our facilities on generators for six to eight hours at a time,” says Kinney. “And once a year we conduct a pull the plug test. The load goes on batteries, UPS and the generators run for 10 hours. This is us exercising our disaster recovery plans.”

The hurricane has changed how the business operates at Telehouse. Canone says there have been many follow up meetings and out reach to customers. “We spotlighted what were the actual problems that occurred in this event. Customers had issues they had not planned for and now recognise they need a layered approach to a business continuity plan. We’ve identified what worked and what didn’t”