Today WordPress.com was down for approximately 110 minutes, our worst downtime in four years. The outage affected 10.2 million blogs, including our VIPs, and appears to have deprived those blogs of about 5.5 million pageviews.
What Happened: We are still gathering details, but it appears an unscheduled change to a core router by one of our datacenter providers messed up our network in a way we haven’t experienced before, and broke the site. It also broke all the mechanisms for failover between our locations in San Antonio and Chicago. All of your data was safe and secure, we just couldn’t serve it.
What we’re doing: We need to dig deeper and find out exactly what happened, why, and how to recover more gracefully next time and isolate problems like this so they don’t affect our other locations.
I will update this post as we find out more, and have a more concrete plan for the future.
I know this sucked for you guys as much as it did for us — the entire team was on pins and needles trying to get your blogs back as soon as possible. I hope it will be much longer than four years before we face a problem like this again.
Update 1: We’ve gathered more details about what happened. There was a latent misconfiguration, specifically a cable plugged someplace it shouldn’t have been, from a few months ago. Something called the spanning tree protocol kicked in and started trying to route all of our private network traffic to a public network over a link that was much too small and slow to handle even 10% of our traffic which caused high packet loss. This “sort of working” state was much worse than if it had just gone down and confused our systems team and our failsafe systems. It is not clear yet why the misconfiguration bit us yesterday and not earlier. Even though the network issue was unfortunate, we responded too slowly in pinpointing the issue and taking steps to resolve it using alternate routes, extending the downtime 3-4x longer than it should have been.
Great rescue work…..keep up the good work….
The outage just made me cognizant of how much we depend on wordpress and how grateful I am that wordpress provides this free service to bloggers.
P.S. In the last 12 hours, the number of total hits for my blog keeps fluctuating, receding from a higher figure to a lower figure, then later, back again to the higher figure. Is anyone else experiencing this problem?
I have come to depend on the reliability of WP.com. These things happen, although they shouldn’t. Sometimes I am amazed how a redundant system can come crashing down when you pull just one plug, while it can survive many other plug pullings elsewhere.
thanks for the communication.
Noticed but it was no problemo!
Whew! For a minute there I thought it was my little Toshiba laptop….Seriously though, we knew your server was on the fritz; we just had to wait until it came up. Good work, and thanks!
I notice it was not just WordPress that was down…I also noticed that the Google Servers were down for a few hours. YouTube also was having a few problems
i am not sure if this made it worse for WordPress but just something i noticed earlier today from more than one site.
Today the internet seemed to be having at bad hair day all at once…Glad you are working on a solution…
Have an awesome day
Sammi
I missed the downtime completely, but let me take this opportunity to say Thank You for allowing so many of us to communicate on our Blogs…you are the Best.
Yea, great responce time. Way to go!
Thanks for ‘splainin’, Ricky! 😉
Thanks for all of the hard work you do so that we can use our blogs everyday.
WordPress you are Outstanding! Keep Rockin’! your friends, XR VOLUME
Hello ma.tt!
Stupid STP, and they want me to enable that at work. I say just don’t plug a switch in to a switch. It’s good to see all these comments are, well, nice, really.
I’d like to thank you for WordPress as well. Just don’t let this even happen again — ever!
😉
[that last bit was a joke]
The technicians and other staff at wordpress.com do such a great job of providing your customers a service that is second to none. Thank-you for giving us such a great blogging experience.
Thank you for the update and good luck in the future ^_^
Also good luck blogging everybody. 🙂
you guys rock thanks for the fast update
interesting failure mode are you sure it’s STP? STP is a layer 2 protocol designed to facilitate redundant links between switches and shouldn’t normally affect the wan links. Sounds more like the routing protocol you use had a senior moment and by mistake gave a very thin pipe a better metric than it should have
Didn’t even realise (I must not blog a lot…) but thanks for this update. Appreciated it.
thanks for the transparency and detailed information. great job, you guys rock !
hmm.. we know its happens.. thnx for hard work
You guys do a great job and the web breaks from time to time. Truly appreciate that you got it back and investigated the issue. Best wishes.
they’re machines — from time to time they break…
Thanks so much for all the hard work, 2 hours is not that bad for 4 years (.006% time down, 99.994% running) in my books that’s better than Lysol claims to kill germs!
Thanks for being so forthcoming with info – much appreciated.
Thanks for your hard work and WordPress is rocking!
Thanks for being so quick to repair the problem AND for being so transparent by keeping us informed in a timely fashion. That’s why my blog is here on wordpress.com!
Ask me why I chose WP?!?!?
Great job.
is there any way to recover my lost post from my iphone? the photos were saved but the essay is gone
Thanks guys! Didn’t notice though 🙂
We understand…
good service anyway.
hmmm, seems like I didn’t get affected but that’s why some blogs wouldn’t work. Anyway, I’m glad to see it’s fixed and my data is safe!
I definitely like the honesty.
I just love my word press blog and although I noticed the downtime I really didn’t know what that meant until I read your very sincere message. I really appreciate having the opportunity to blog and I’m happy you and your team are so conscientious. Very best wishes!
thanks for resolving it and your great explanation.
Again a great service and thanks for the good updates all day long! Keep up your great work!
Thank you for the updates! I only have one question though. I am a new blogger and I would like to know how can my blog appear on the search engines?
Holy cow!!! A whole 110 minutes!!! lol.
See, kids, there are honest people out there.
Now if only your name was TOYOTA. 😉
I didn’t have this affect my session.
Guess I missed it.
Well, at least I’m safe 🙂
Received a tweet that I was to be interviewed, that pointed me here. What’s the question? Did the outage affect me? Unknown at this time. I do know that I needed to login again, therefore a password change might be in order….
You guys are doing great things, don’t let a little thing like the spanning tree protocol trip you up again. Should we suspect foul play? 🙂
I slept right through it.
it didn’t affect my blog.
Thank you for being so efficient! Keep up the good work!
“…specifically a cable plugged someplace it shouldn’t have been…”
Ah, human error. 🙂
I had to chuckle about the failover thing. I worked for the Sec. State’s office here in RI and we hosted among many other things the Central Voter Registration system.
It was designed by EDS but to my knowledge they never did a failover test. I’m just waiting to see what happens.
Glad you guys are up and running again.
Was wondering as to why I could not log on to wp. Thought something wrong with my internet provider’s server. Thanks for being so upfront about it. Really appreciate the good work. Cheers!!
Wonder why, up to now, it’s very hard to access the wordpress.
The homepage seems to take forever to load, to edit my blog, to update also seems to take forever.
So frustrating.
Hope you guys can fix the problem quickly and thanks again for all the hardwork 😀
Thanks for letting us know about this problem and getting things working again =)
Nice shirt / scarf combo.
10 million blogs is a lot of blogs.
way to go to solve it and communicate to us.
Thanks for being so keen in alerting us I may not have noted it but that was good that you fixed it in time and promise to avoid it happening again!
i had no idea it was even down. but thanks for fixing it! wp rocks
No sweat. Thanks for the quick troubleshoot.
You handled this very well! Thanks!
Thanks for this very quick update! We appreciate it
I missed one view! lol
YOU ROCK!
Thanks for the transparency (as others have said) and the quick recovery. As luck would have it, today I am live-blogging a very popular food event in Arkansas. I only lost connectivity for a few minutes when I needed it, but I seem to have lost about 100 page views in stats. Any specifics on that?
Arfoodie: please contact support to request help, we haven’t had any downtime since this post. https://wordpress.com/support/contact/
that’s why i am unable to publish smoothly my posts….. thank god it won’t affect my already published posts. by the way thanx for this post update 🙂 at least i know the reason behind it 🙂
Thanks for this hard work. Now i have the opportunity to understand people with a foreign language. Thanks and good luck for your future.
I knew something was wrong, but I thought it was me
You guys are the best, thanks!