Re: Recovery via Unrecovery



On Sun, 02 Sep 2007 18:00:15 +0000, David Gersic wrote:

The chief problem here seems to be a lack of testing.

Testing. Testing! *froths at the mouth for a few minutes*

The inability of some people to understand the essential need for thorough
testing would astound me, if my ability to feel surprise at the stupidity
some people display hadn't already been burnt out from continuous overload.

*sets the not-so-way-back machine to last week*

Myself and a couple of my co-workers had been spending the last month or
so preparing to roll out some updates to our main web servers. A shiny new
service, some bug fixes, a couple other minor tweaks. A goodly portion of
this time was spent testing everything as best as we could to make certain
that nothing would unexpectedly fail come the big switch. Many minor
changes were made and potential problems were nipped in the bud. However,
knowing that the tendrils of long-standing systems were many and that
there might well be older functions that might be impacted by the updates
that we did not know of and therefore could not test, we asked that the
cow-orkers who regularly spoke with the customers, whose job it was to
know what services and aspects of the server were being used, to test
everything and report the results.

Come the day of the switch, we had green lights reported from everyone. A
few more bugs had been found and fixed, and the commands were given. The
new, improved servers were placed into position and the old ones were
pulled back.

And everything worked perfectly, right? Of course not. Clients called.
Cow-orkers complained. Management asked pointed questions. We pulled the
new servers right back out again and began to track down what had
happened. Every single error would have shown up with the bare minimum of
testing. A full three-quarters of the errors could be tracked down to one
specific cow-orker's clients. Her response when asked why she hadn't
reported the problem? 'Oh, was that today?' She'd done no testing at all,
but rather had rubber-stamped the go-ahead.

I've added the areas that generated errors to my testing list, so
those particular items won't cause a problem again, but that doesn't
remedy the core problem. I really don't want to get to the point where I
*have* to test everything myself, but if I don't do it myself I distrust
the result.

I need a drink.
.



Relevant Pages

  • Re: Change the IP address on the DC that holds all roles and host Active DNS
    ... Do not configure two different DNS servers on the clients. ... If you have fixed ip's on them, i would change them after the servers are back again. ... I would start with the DC's, change the ip's shut them down, connect to the new switch and restart, will take some time. ... Then if both DC's are up again, check DNS for the new ip's under the zones and replication between them with repadmin /showrepl. ...
    (microsoft.public.windows.server.active_directory)
  • Re: Servers Timeout Issue!!! HELP!!
    ... Some time ago we halso problems with internal communication between servers and also clients. ... We figured out that it was caused by our CISCO 4 switch stack, even after reloading all switches the problems come back, so we started one by one and at the end we started changing the stack cable, because in the switch evenlog where some strange errors between 2 switch stack ports. ... My network consists of windows 2003 servers ...
    (microsoft.public.windows.server.networking)
  • Re: Help with setting up Sites.
    ... Site A - respresenting physical site B ... servers is increasing by the day. ... Do you have any DCs at SiteB? ... clients servers in the relevant sites to authenticate against them. ...
    (microsoft.public.windows.server.active_directory)
  • Re: adding machine to domain with NATed IPs
    ... sounds that the DCs are not reaching the>> clients ... can the servers pint the clients by IP and Name? ... we specified these IPs as DNS server within ...
    (microsoft.public.windows.server.active_directory)
  • Re: Computer Browsing Service - anyone want to contribute for a good conversation?
    ... Do you have all client machines and servers ... Browse lists are built and exchanged by the computer browser service. ... It doesn't matter which subnet your clients are in. ... The most common cause of master browser failures is multihomed ...
    (microsoft.public.windows.server.networking)