Meme Monday: “I got 99 problems but a disk ain’t one”

Due to Internet Spaceships (aka, Spreadsheets in Space), whenever I hear the “I got 99 problems” line, I always think of a station that we had in a solar system whose name started with “9-9.” The alliance that I flew under had a naming standard of the first 3 characters or so of the station name matched the system name to cut down on confusion and help you confirm that you knew where you were. That station was named “99 problems but <…> ain’t one.” I don’t remember what was in the gap anymore; there’s a decent chance that it wasn’t family friendly anyway.

Anyway, I’m a huge nerd.

On topic…

This time around for what is becoming Tom’s monthly writing exercise, the topic is to make a list of things that can go wrong with SQL Server that are not disk issues. This should be fairly easy for me, since we have more I/O than God, fortunately, but we’ll see how much of a trainwreck I can turn this into.

Cowboy Sysadmins. I know this is asking for it. I know that there are plenty of good sysadmins out there who are just as pragmatic as good DBAs are. The problem is that they’re not always the only ones working on a project or active on the support rotation. Things can get changed that shouldn’t be when they shouldn’t be. It happens. How do I know this? Well, see, I used to be a Cowboy Sysadmin. I used to be a pretty strong opponent of ITIL and things like Change Management. Then I got a clue and things were better.
Cowboy Developers. Self-explanatory.
Linked Servers. I know Tom listed this one in his original post, but it really is a disaster waiting to happen. Long story short, I wound up implementing an LS over a WAN link, and it’s a miracle that worked. There are a lot of moving parts involved in LSes, and when a cluster is involved, it’s even worse. PLUS, when name resolution isn’t working through other means, hostnames need to be added to LMHOST (LMHOSTS, not HOSTS (!)). That says to me there’s some ancient piece of code in MSDTC in use and that scares the crap out of me.
Crap code/Design. And by “Crap,” I mean, “Legacy.” Why is old code always the worst? Were the people here before really that dumb, or did they just not know any better? I mean… the storage engine has to do the exact same thing with 9 indexes on a table now that it had to do a decade ago… (no, I’m not kidding).
Letting me design your DB. I’m better at this than I was two years ago, but you should still have someone who knows what they’re doing look at it before you do anything else with it.
Reporting out of your OLTP system. This may be OK…but it may be very, very bad. I’ve seen some doozies, but sometimes it isn’t completely avoidable. Limit it as much as you can. You don’t even need to go full-blown data warehouse or data mart for this, either; log-shipped or a backup restored on another instance may get the scary queries off your back.
Flakey alerting/monitoring system. This doesn’t directly affect your company’s DBs, but if a backup job failed last night and you didn’t get an email alert about it… well, things may not be OK, would they?

Tom was hoping to get nine out of us, but I’m tired, this is due tomorrow, and the President is apparently going to drop a bomb on us in five minutes, so this is what you get.

Tom, this whole thing is kind of fun and it really does give me/us easy fodder to write about, so I appreciate it.

4 Responses

Write a Comment»

Justin Koby
May 2, 2011 at 10:28 AM | Permalink | Reply

I’m not a SQL SysAdmin. I just play around with databases on an as needed basis. Here are a couple of my inexperienced thoughts.

What about poor patch management or does that fall under Cowboy Sysadmins? You know, where a new windows patch breaks old versions of SQL but it isn’t discovered until after the patch is applied to the production server. But, if a test server was utilized, then the problem would have been found.

What about momentary stupidity? I once rebooted a production server on accident. I walked up to a Linux box and pressed CTRL-ALT-Delete thinking I was on a Windows box.
1. Airborne Geek
  May 2, 2011 at 11:46 AM | Permalink | Reply
  
  I would totally say that Bad Patch Management is separate from the sysadmin one and definitely a good one to put on such a list. Patches need some extra testing to ensure that nothing bad happens to SQL or its DBs when they’re applied, and that is just as true for OS patches as it is for SQL Server patches/hotfixes themselves. You are very much correct.
  
  As for your second one… There’s always the human element involved
  1. Justin Koby
    May 2, 2011 at 11:53 AM | Permalink | Reply
    
    Well, that brings you to 8 items. One more and you’ll have your 9.
Meme Monday: #SQLFamily | Airborne Geek
November 7, 2011 at 9:02 AM | Permalink |

[…] LaRock’s (blog | @SQLRockstar) brainchild. Its idea is to spur all of us to write something, and my sister agrees that this is a good idea. So, here I am, writing about this month’s topic, which is: what […]