This is a chronicle of events between 2007-09-27 and 2007-10-02. It's not quite an "Internet outage", but it had mostly the same amount of drama that a real outage would have had, so I'm keeping that word in the title of this page.
My home network consists of a couple Linksys switches (one upstairs, one downstairs) and some desktop and laptop PCs which connect to each other via ethernet. This network connects to the Internet through an OpenBSD machine which acts as my router/firewall (among other things). The OpenBSD machine has been doing this for several years, across several version upgrades, with a configuration that was untouched in all that time. More on this later.
I run traffic pattern graphs on this machine. Shortly after 2:00 AM on September 27, the traffic looked like this:
Unknowing, I drove to work as normal on that day. When I got there, I attempted to connect back to my home machine to set up ssh tunnels (a regular daily activity for me). I was able to connect after a couple tries, but the connection only remained useful for a few seconds. Then, it "froze", or became unresponsive to my keystrokes. I repeated this a few times, with the same results. I looked at the traffic graph and saw that it looked highly irregular, and that this irregularity had begun somewhere around 2:00 in the morning.
My intial assumption was that the ADSL modem (a 10-year-old Westell model, which had been sent to me by the phone company when I originally signed up about 10 years ago, at my other house, back when CenturyTel was known as Century Telephone) needed to be reset. Unfortunately, I wasn't able to do that remotely. I did reboot the OpenBSD router (by sneaking in a shutdown -r now command in the brief window of usability), but it made no difference.
Meanwhile, my wife had also noticed problems using the Internet from her laptop, and called me. I explained that I had seen problems as well, but that I couldn't fix them remotely. I told her that I would try to power cycle the ADSL modem when I got home, unless she wanted to try it herself. (To the best of my knowledge, she did not attempt this.)
When I got home that evening, I tried power cycling the modem; tried rebooting the OpenBSD router; tried reseating the phone cord leading from the ADSL modem to the wall jack; but none of that helped. The problem had remained consistent all day, and despite all my attempts to solve it: the PPPoE connection could be established, and would work for a brief period of time (15 to 60 seconds at best), and then would die. This repeated over and over, as the software on the OpenBSD machine kept retrying every time it died.
I called CenturyTel's "DSL help desk" and spoke with them. I believe the person I reached this first time was named Emily, but that might be incorrect. I asked her whether anything had changed on their end. She typed something, and told me she was not showing any changes or outages in Ohio. She asked me what version of Windows I was using. I told her it was an OpenBSD router that had been working, unchanged, for several years. She told me that they could not support me unless I had a Windows or Macintosh machine connected to the ADSL modem.
I took my wife's Vista laptop downstairs, and connected it to the ADSL modem. I don't know a damned thing about Windows, and don't want to, so I have no idea how I was supposed to supply the PPPoE username and password to it. I called CenturyTel again, and spoke with the next person (Ron, or Brian, or something like that -- they don't speak very loudly or very clearly when introducing themselves). I told him that I had been told I must connect a Windows laptop to the ADSL modem, and that I had now done this. And that it still was not working (at all; not even the brief periods of connectivity I was with OpenBSD). I asked him whether I needed to supply a username and password; he said I did not. He said he would initiate a repair ticket, and that a technician should be in contact with me within 24-48 hours.
So, for the rest of Thursday, and all day Friday, we had no Internet at all.
On Saturday morning, we still had not been contacted. I called them again, and spoke to person #3 (whose name I don't think I caught), who said that no repair ticket had ever actually been filed. He said he would file one, but that their technicians don't work on weekends, so someone would contact me on Monday, "between 8 and 5".
I drove to Best Buy and bought a laptop. Actually, it wasn't quite that simple. I drove to Best Buy, looked over all their laptops, and asked whether I could have one without Vista. They said that all their laptops have Vista on them, and that in order to get one with XP, I'd have to order one through their business program (or something like that). So I looked at the least offensive of the Vista laptops. There was a Celeron on clearance for $350, and a dual-core one for $500. The salesperson told me that the dual-core one would have longer battery life, and it had more RAM in it anyway, so I said I'd like to buy one of those. He consulted his inventory computer and informed me that they didn't have any left. And that the only store in the area that had one was in Sandusky (about 45 minutes away), and they only had their display model for sale.
So I asked whether they had any of the $350 one left. He checked again, and informed me that, no, they didn't have any of those either. But since it was on a clearance sale, they were willing to sell me the display model. I said that was acceptable. They took it over to the service area, and gathered up the accessories and documentation for it, and told me that they would have to "restore" it to its normal state; and therefore, I should come back in a couple hours. I paid them the money, and did that, and came back and got it.
So now I had a machine with a wireless NIC in it, ready to "steal" me some Internet (arrr). My wife asked me whether she could borrow my laptop to do the same thing. I looked at her strangely, and asked why she didn't just use her own laptop for that purpose. Hers was better than mine, after all (despite also being a Vista machine), and had features mine didn't. Somehow, she had been using her machine for several months without being aware that it had a wireless NIC in it. (We don't have a wireless router in our house at the moment.)
She said that one of the local McDonald's restaurants has wireless access. So, while I was still going through the initial boot-up and configuration of Vista, she took off. Before I had finished (it takes a while, as many of you may know), she returned, disappointed, and informed me that one must pay money to activate the WiFi at McDonald's. Apparently she was not willing to do that.
Her next plan was to go to her mother's house, because she knew that her mother had a wireless router (having been the one to help set it up originally). I said I'd like to go with her. So we packed up the kids and made it a family visit, and spent several hours there. I even managed to finish a rather impressive Kingdom of Loathing ascension that I'd been in the middle of.
While there, my wife's family members asked her why she hadn't attempted to use wireless Internet access from one of our neighbors. She had never tried it, because she didn't know she had a wireless NIC. I hadn't tried it yet, because my own laptop was only a couple hours old at that point, and I simply hadn't had the opportunity yet. I hadn't even managed to download the 26 Vista update patches yet (those happened by the end of the day).
On Sunday, I tried "stealing" Internet access from the surrounding community. From my desk on the southeast side of our house, I wasn't able to find any networks. From my wife's desk on the northwest side, I was able to get a mediocre signal for a network named "NETGEAR" and a very poor intermittent signal for a network named "MARINAWIFI3" or something similar. (We live a block or two away from a marina, so I presume it was theirs.) The "NETGEAR" network sounded promising. I believed that it was probably my father's unsecured router (he lives two houses west of us), but I wasn't certain.
In any case, the "NETGEAR" network worked well enough for casual purposes. I was able to do reasonable turn-playing in my online games, and my wife was able to do... whatever it is that she does.
On Monday, with the CenturyTel problem still occuring unchanged, I went to work again as usual, hoping that someone at CenturyTel would notice that a router on their end was acting flaky, and reboot it, and that that would magically solve the problem. I left my wife a sheet of paper with all the information about the problem I could manage to convey on paper (things like "We have a static IP (209.142.155.49) and therefore we might be on a different router than most of the other Lorain customers."). When I got back from work, the problem still had not been solved. My wife had called them, and had learned that -- again -- no problem ticket had actually been filed. She spoke with them at some length, apparently, and went through some of their basic diagnostic routines. Apparently she had managed, through luck, to get one of the people with more brains than a begonia, and had been told how to supply the PPPoE username and password to Vista in order to test the ADSL modem and connection properly using her Vista laptop. Surprisingly, this had worked -- the connection was stable for at least 10 minutes or so while the laptop was connected directly to the ADSL modem. By the time I came home, though, she had reconnected the ethernet cable to the OpenBSD router, and the problem we'd been having since Thursday continued unabated.
I called CenturyTel again, spoke to person #5 (I think...), and reported the problem again, including all the information I knew. I asked him, repeatedly, to please tell me what had been changed on their end. Because at this point, I knew beyond any reasonable doubt that something had changed. He said he would file a problem ticket, and that someone would contact me within 24-48 hours. (This is apparently a phrase they use when they want me to stop talking to them forever.) I told him that three separate people at CenturyTel had "filed a problem ticket", none of which had actually been filed. I asked him what the problem ticket's number was. He said it wouldn't have a number until it "goes through" tomorrow. Exasperated, I gave up. We had an event to attend that evening, so I had no time to pursue the matter on Monday evening after that useless half hour phone call.
On Tuesday morning, there was still no change. I went to work, and did some quick checking of Google for OpenBSD pppoe stuff. The beginner's guide which I had originally followed when setting up userspace PPPoE several years ago told me that the new kernel PPPoE driver (about 2 years old now) was considered superior, despite having slightly less flexibility. In any case, the configuration looked simpler, which is generally a good sign.
When I came home, I backed up /etc, took down the userspace PPPoE configuration, and replaced it with the kernel PPPoE setup. This involved changing a couple files and rebooting, then realizing that I also needed to change another file, and another file, etc. But the end result is that, barring some tweaks in my pf.conf(5) for performance, it's working:
CenturyTel never did call back, as far as I know.