Why DID Facebook, Instagram, WhatsApp and Facebook Messenger go down worldwide yesterday? What went wrong and why it took almost seven HOURS to fix
- Facebook has blamed yesterday’s global outage on an internal technical issue
- It took down the site, as well as Instagram and WhatsApp, for almost seven hours
- The US tech giant has not ruled out foul play, but external cyber attack is unlikely
- Experts say sabotage by an insider would be possible but could just be a mistake
- It took over almost seven hours to fix by resetting servers manually but engineers initially couldn’t get into HQ keycards stopped working
- Issue exacerbated because many staff are still working from home due to Covid
Facebook, Instagram and WhatsApp were all brought down for almost seven hours yesterday in a massive global outage.
Problems began at around 16:45 BST (11:45 ET), leaving users unable to access the three platforms, as well as Facebook Messenger and Oculus, for the rest of the evening.
Facebook, which owns all the services, has blamed the outage on a bungled server update and insists it was not an attack from outside the company.
The US tech giant said the problem was caused by a faulty update that was sent to its core servers, which effectively disconnected them from the internet.
But what exactly went wrong and why did it take more almost seven hours to fix? Here is MailOnline’s breakdown of the issue…
Facebook, Instagram and WhatsApp were all brought down for almost seven hours yesterday in a massive global outage. The US tech giant said the problem was caused by a faulty update that was sent to its core servers, which effectively disconnected them from the internet
A Facebook staff member reportedly accidentally deleted large sections of the code (pictured) which keeps the website online
WHAT IS THE DOMAIN NAME SYSTEM AND HOW DOES IT WORK?
The Domain Name System, or DNS, is the directory of the internet.
Whenever you click on a link, send an email, open a mobile app, often one of the first things that has to happen is your device needs to look up the address of a domain.
There are two sides of the DNS network: the authoritative side, ie webpages and other content, and the resolver side, devices that are trying to access this content.
Every domain needs to have an authoritative DNS provider, servers which store DNS records. Amazon, Cloudflare and Google are among the bigger names in authoritative DNS server provision.
On the other side of the DNS system are resolvers. Every device that connects to the Internet needs a DNS resolver.
By default, these resolvers are automatically set by whatever network you’re connecting to.
So, for most Internet users, when they connect to an ISP, or a WiFi hot spot, or a mobile network, the network operator will dictate what DNS resolver to use.
The problem is that these DNS services are often slow and don’t respect your privacy.
What many Internet users don’t realise is that even if you’re visiting a website that is encrypted, indicated by the green padlock in your browser’s address bar, that doesn’t keep your DNS resolver from knowing the identity of all the sites you visit.
That means, by default, your ISP, every WiFi network you’ve connected to, and your mobile network provider have a list of every site you’ve visited while using them.
Why did Facebook go offline?
Facebook issued a statement saying the cause of the problem was a configuration change to the company’s ‘backbone routers’, which coordinate network traffic between the tech giant’s data centres.
‘This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,’ the statement said.
Web security firm CloudFlare offered more details about what happened, revealing that Facebook had effectively vanished from the internet.
The social media company made a series of updates to its border gateway protocol (BGP), CloudFlare’s chief technology officer John Graham-Cunningham said, causing it to ‘disappear’.
The BGP allows for the exchange of routing information on the internet and takes people to the websites they want to access.
It is essentially the roadmap that transports you to the location of each website – known as the Domain Name System (DNS) – or its IP address.
As a consequence of the BGP problems, it meant DNS resolvers all over the world stopped resolving their domain names.
Why were Instagram, WhatsApp and Facebook Messenger also down?
It wasn’t just Facebook that went offline – its associated services Instagram, WhatsApp and Facebook Messenger were affected, too. Some people also reported issues with Facebook’s virtual reality headset platform, Oculus.
This is because the tech giant has a centralised, single back end for all of its products.
Facebook runs its own systems through the same servers, meaning everything needed to fix the problem – from digital engineering tools to messaging services, even key-fob door locks – was also taken offline.
Matthew Hodgson, co-founder and CEO of Element and Technical Co-founder of Matrix, said the outage illustrated the advantage of having a ‘more reliable’ decentralised system that doesn’t put ‘all the eggs in one basket’.
‘There’s no single point of failure so they can withstand significant disruption and still keep people and businesses communicating,’ he added.
It wasn’t just Facebook that went offline – its associated services Instagram, WhatsApp and Facebook Messenger were affected, too. Some people also reported issues with Facebook’s virtual reality headset platform, Oculus
How many people were affected?
Downdetector, which tracks outages, said it was the biggest failure it has ever seen, with 10.6 million problem reports around the world.
In total, Facebook has 2.9 billion monthly active users.
The issues started at 16:44 BST (11:44 ET), with nearly 80,000 reports for WhatsApp and more than 50,000 for Facebook, according to DownDetector.
From around 22:30 BST (17:30 ET), some users were reporting that they were able to access the four platforms once again. However, Facebook did not work again for many people until at least an hour after that.
WhatsApp said it was back up at running ‘at 100 per cent’ as of 3:30 BST (22:30 ET) this morning.
Could it have been a cyber attack?
Interestingly, Facebook’s statement is carefully written and doesn’t rule out foul play.
That being said, the chances of it being an external cyber attack seem unlikely.
A massive denial-of-service hack that could overwhelm one of the world’s most popular sites would require either coordination among powerful criminal groups or a very innovative technique.
Sabotage by an insider, however, would be theoretically possible, according to tech experts.
What’s also eye-opening is that the outage hampered Facebook’s ability to address the problem, because it took down internal tools needed to fix it.
This meant the issue lasted for nearly seven hours, which is highly unusual.
Users around the world reported problems with Facebook, Instagram and WhatsApp on Downdetector
RECENT FACEBOOK OUTAGES
Last month, a technical issue with Facebook-owned Instagram caused an outage that plagued users around the world for 16 hours.
Problems started just after 8am on Thursday. About 18 hours later, at 2am on Friday, Instagram announced the problem had been fixed.
However, the last time Facebook, Instagram and WhatsApp went down at the same time was in June.
More than a thousand people in countries including the United States, Morocco, Mexico, Bolivia and Brazil reported outages.
There were also two Facebook platform outages in March, with Instagram down on March 30, and all three down on March 19.
It compounded a difficult week for Facebook, which has faced accusations of easing up on efforts to stop misinformation, allowing hate to be magnified on its platforms and being aware that Instagram can harm teenage girls’ mental health.
The disruption also occurred just 24 hours after a former Facebook employee gave an interview to CBS News after leaking documents about the social network.
Whistleblower Frances Haugen, who is scheduled to testify before a Senate subcommittee today, said the company had prioritised ‘growth over safety’.
Facebook insisted it was ‘just not true’ to suggest the company encouraged bad content or did nothing in response.
Cybersecurity specialist Jake Moore said: ‘It is quite interesting that Facebook’s statement has not ruled out foul play.
‘Like the locks on a bank safe, the money inside is only as secure as the person with the keys – cybersecurity is as much about a company’s own internal security procedures as it is about fending off outsider attacks.’
He reiterated that it was ‘not due to an external cyber attack’ because web blackouts more often originate from an undiscovered software bug or human error.
So was it a mistake by someone within Facebook?
There’s every chance it could have been an accident rather than an intentional act of sabotage.
It has been claimed that a Facebook staff member may have accidentally deleted large sections of the code which keeps the website online.
Facebook said its engineering teams had identified ‘configuration changes’ to its backbone routers that brought its services to a halt.
The company said these changes caused a disruption to network traffic and blocked communication between its data centres. Employees’ work passes and email were also reportedly affected by the internal issue.
Why did it take so long to resolve the problem?
When Facebook’s platforms went offline, engineers rushed to the company’s data centres to reset the servers manually, only to find they couldn’t get inside.
New York Times’ technology reporter Sheera Frenkel told BBC’s Today programme this was part of the reason it took so long to fix the issue.
‘The people trying to figure out what this problem was couldn’t even physically get into the building’ to work out what had gone wrong, she said.
To make matters worse, one insider claimed the outage was further exacerbated because large numbers of staff are still working from home in the wake of Covid, meaning it took longer for them to get to the data centres.
Downdetector, which tracks outages, said it was the biggest failure it has ever seen, with 10.6 million problem reports around the world. Pictured, the issues starting at 16:44 BST (11:44 ET)
Engineers were rushed to the company’s data centres in Santa Clara, California (pictured), to reset the servers manually
Facebook has not yet gone into much detail about how the issue was finally fixed but it is understood that engineers had to manually reset the servers where the problem originated.
Software testing expert, Adam Leon Smith of BCS, The Chartered Institute for IT, said: ‘It is unlikely the issues were directly caused by people working from home, however it is quite possible that it took so long to restore the service because of reduced staffing within the data centre.
‘This would compound the problem because the nature of the failure meant that remote access to the data centre was also unavailable.’
How much did the outage cost?
During the blackout, Facebook shares plunged by five per cent, wiping an estimated $7 billion (£5 million) off founder Mark Zuckerberg’s personal fortune.
The website Fortune also estimated that seven hours of downtime could have cost the company up to $100 million (£73 million) in lost ad revenue.
But it’s not just Facebook which will have lost out.
Businesses who rely on its services are also likely to have lost huge sums of money, although so far there have not been any cost estimates for exactly how much.
NetBlocks, which tracks internet outages and their impact, estimates that the outage cost the global economy $160 million (£117 million).
What are the chances of it happening again?
The huge global outage Facebook experienced is a fairly uncommon one, although there’s not a lot the company can do to avoid a similar situation because of its centralised back end system.
Along with the Fastly outage in June – caused by a single customer changing their settings – and Cloudflare going offline in 2020, it shows the problem of having a single point of failure for a huge number of services that people use.
There are currently no obvious solutions to this, but this latest outage is likely to reignite the debate around internet infrastructure.
For many individuals and businesses too, the incident showed just how much they depend on Facebook and its services not just to communicate, but also to log in to other platforms.
In response, people have been encouraged to consider using other credentials beyond their Facebook log-in details to access other online services.
WHAT WEBSITES ARE MOST SECURE?
Cybersecurity firm Dashlane looked at 22 different websites and ranked them based on how secure they are and their login protocols.
One point was awarded for the presence of SMS/email authentication and a software token for of authentication but three points were awarded for the use of hardware tokens.
The cybersecurty firm considered anything less than full marks and the presence of all three security measures to be a fail.
2018 UK Rankings
5/5 Points – PASS
- Battle.net
2/5 Points – FAIL
- Amazon
- Apple
- Evernote
- Patreon
- Slack
1/5 Point – FAIL
- Airbnb
- eBay
- Indeed
- Yahoo!
0/5 Points – FAIL
- Asos
- Trip Advisor
Source: Read Full Article