People use Wi-Fi inside Toronto’s Fairview Mall on July 8. Yader Guzman/The Globe and Mail
The engineers at Rogers Communications Inc. RCI-BT began the sixth step of a seven-step process to upgrade the core infrastructure supporting the company’s wireless and broadband networks at 2.27am on 8 July.
Two hours and 16 minutes later, a coding error was introduced that set off a cascade of events, leading to a massive outage that left millions of Canadians without cellphone, internet or home phone service for at least a day .
The shutdown of one of Canada’s dominant telecommunications networks created widespread chaos. Rogers failed to send out four emergency alerts to its wireless customers in Saskatchewan, including three tornado warnings and a dangerous person report.
Rogers customers were unable to call 911, and Interac’s debit system was also affected, causing problems for both consumers and businesses. In Toronto, the outage forced Canadian singer-songwriter The Weeknd to postpone a concert that was supposed to be held at the Rogers Center that night.
Initially, even Rogers wasn’t sure what was causing the service outage. But weeks later, in a detailed filing in response to questions from the Canadian Telecommunications and Radio-television Commission, the company gave a full account of its version of events.
Opinion: Rogers still has some explaining to do about his termination and the fallout from his Shaw deal
Opinion: Rogers outage a reminder of Canada’s failure to set up secure wireless network for emergency services
Those documents, which were released publicly by the CRTC in redacted form on Friday, provide new details about the outage and offer an early look at the set of facts that Rogers executives will rely on on Monday, when they are expected to testify about the incident a public hearing before the committee on industry and technology of the House of Commons.
Like many of its peers, Rogers currently has a core network that supports all the services it offers. The core is essentially the brain of the network. It receives, processes, transmits and connects all voice, wireless data, Internet and television traffic.
The telco had begun the seven-phase process to upgrade the core in February, following what the company described in its CRTC filing as a comprehensive planning process that included budget and project approvals, risk assessment and testing.
The first five phases had gone smoothly. But at 4:43 a.m. on July 8, code was entered that removed a routing filter. In telecommunications networks, data packets are guided and routed by devices called routers, and filters prevent these routers from becoming overwhelmed by limiting the number of possible routes presented to them.
Removing the filter caused all possible routes to the Internet to go through the routers, causing several of the devices to exceed their memory and processing capacity. This caused the core network to shut down.
Rogers uses equipment from different manufacturers in its network core, and the two vendors from which the company buys routers have different designs and approaches to managing traffic and protecting equipment from overload. Those differences are at the heart of the outage Rogers experienced, the company said in filings.
But, at first, the company’s technicians had not yet determined the cause of the catastrophe. Rogers apparently considered the possibility that his networks had been attacked by cybercriminals. At 6 a.m., Jorge Fernandes, who was the company’s chief technology officer at the time, contacted his counterparts at Telus Corp. TT and Bell Canada BCE-T by BCE Inc. to inform them of the outage and warn them to watch out for cyberattacks, the company he said in his presentation.
Although Bell and Telus offered to help, Rogers quickly determined that it would not be able to transfer its customers to its rivals’ networks because certain elements of Rogers’ network, such as its centralized user database , were inaccessible as a result of the outage. In any case, rival networks would not have been able to cope with the sudden surge in traffic from Rogers’ 10.2 million wireless subscribers, the telco said.
Rogers disruption may affect decision around $26 billion acquisition of Shaw, says Champagne
Mr. Fernandes was in Portugal when the outage began, and immediately began making arrangements to return to Canada, according to two sources familiar with his whereabouts. The Globe is not identifying the sources because they were not authorized to speak publicly on the matter.
Meanwhile, Rogers’ network team gathered at the company’s network operations center in Brampton, Ontario, restored network access, and began trying to figure out the cause of the outage.
To communicate with each other and coordinate the recovery effort, some employees began swapping their SIM cards for Bell or Telus SIM cards they had received in 2015 as part of an emergency contingency plan established between wireless carriers.
It wasn’t until 8:54 a.m., roughly four hours after the outage began, that the company publicly acknowledged the situation. “We know how important it is for our customers to stay connected,” the telco tweeted through its customer service account. “We are aware of the issues currently affecting our networks and our teams are fully committed to resolving the issue as soon as possible. We will continue to keep you updated as we have more information to share.”
The company’s disclosures to the CRTC suggest the late reaction may have had to do with problems logging into online accounts used to communicate with customers. The telco said going forward it will ensure its crisis response teams have alternative methods to access social media accounts protected by two-factor authentication linked to Rogers devices.
It took the network team all day to restore the network. They had to disconnect the equipment causing the problem, redirect traffic and confirm network stability before gradually bringing services back online. The process had to be done methodically to avoid overloading the network and causing another outage, the company said.
“Our wireless services are starting to recover and our technical teams are working hard to get everyone back online as soon as possible,” the company tweeted shortly before 10 p.m.
The next morning, Rogers announced that it had restored services to the “vast majority” of its customers. But intermittent problems persisted throughout the weekend.
In an open letter to customers this Sunday, Rogers CEO Tony Staffieri pledged to invest more in testing, monitoring and artificial intelligence to improve the reliability of the company’s networks. He put the cost of the changes at about $10 billion over three years.
The wireless giant will also physically separate its core wireless and wired networks to ensure that any future outages do not affect both services, Mr.
Last week, the company replaced Mr. Fernandes, a former Vodafone executive, by veteran telecom executive Ron McKenzie. Mr. McKenzie was previously the president of Rogers for Business, the division that provides wireless and Internet services to corporate customers.
Mr. McKenzie will begin his new role with an appearance before the House of Commons committee that is looking into the shutdown. The committee, which is made up of members of Parliament from the four main federal parties, is expected to include him, Mr. Staffieri and Rogers chief regulatory officer Ted Woodhead about the five-day billing credit the company offers to compensate its customers. the shutdown The committee may also ask about the network and operational changes the telco plans to make to avoid future disruptions.
As all this is happening, Rogers is awaiting regulatory approval for its contested takeover of Shaw Communications Inc. for $26 billion, before the July 31 deadline. The Competition Bureau is trying to block the merger, arguing it will lead to poor service and higher prices for mobile phone customers.
Your time is valuable. Receive the Top Business Headlines newsletter conveniently delivered to your inbox in the morning or evening. Sign up today.