wiki:DisasterRecovery
Last modified 6 years ago Last modified on 01/31/12 15:16:41

Last updated 2012 Jan 31.

Prerequisites for our network

  • Power
  • Network connectivity (from UW IT, watch eOutage).
    • uplink to the campus network (u.washington.edu servers) and the Internet is also important for some services (email, web, etc.)
    • CAC NOC phone is 206-221-6000 if network is dead
  • DNS (from department, non-local forwarded to CAC)

Our locations

Note that some areas may be OK while others are out:

  • 128.95.44 subnet - Biostr dept, Jim, and Bill's offices
  • 128.208.161 subnet - server room; also has its own switch in rack which may need to be reset
  • 128.95.228 subnet - T170B/165

Wireless is on a completely separate UW subnet, 69.91.128.0/255.255.128.0, and uses its own DNS server.

Virtual Machine Hosts

Most SIG servers are now virtual machines (VMs) running on a few VMwareEsxi hosts. Please see the section "Connecting to ESXi for the first time" for information on installing the Windows GUI client to connect to the ESXi hosts. You can view the machine console from the GUI client, but not from sitting at the VM host server itself. The VM hosts are only accessible from on campus.

Critical SIG Linux Servers

uvula

Symptoms: http://sig.biostr.washington.edu/ website down, CVS down, SIG mailing lists down
  • Location: T170b, large Dell Precision 530 back right corner from the door
  • What to do: check console for obvious messages, check network connectivity, reboot

vagal

Symptoms: NFS data down (`ls /usr/local/data/` hangs)
  • Location: VM on sigvm1, one of the Dell PowerEdge 2950 servers in the rack
  • What to do: see NfsTroubleshooting, possibly reboot (unfortunately sometimes the best way to clear NFS hangs)

axon

Symptoms: main LDAP server down (if both LDAP servers are down, no one can log in), sig-imap server down, ls /home/someuser hangs
  • Location: T170b; Dell OptiPlex 755 near Joshua's desk, clearly labeled "axon"
  • What to do: reboot, if that does not work see LdapTroubleshooting, NfsTroubleshooting
  • Note: other machines get automount information from LDAP about the location of each home
  • For a complete list, use ldapsearch -xLLL | grep automountI

Important SIG Linux Servers


deltoid

Symptoms: FME down, secondary LDAP server down (if both LDAP servers are down, no one can log in) 
  • Location: VM on sigvm12, one of the Dell PowerEdge r610 servers in the rack
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: FME depends on vagal NFS at startup, and takes about 15 minutes to build out its trees

xiphoid

Symptoms: QueryManager down
  • Location: VM on sigvm12, one of the Dell PowerEdge r610 servers in the rack
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup

sphenoid

Symptoms: DXBrain down, various Wirm repos are down (bmap, eyelab, celo)
  • Location: VM on sigvm12, one of the Dell PowerEdge r610 servers in the rack
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup

cuboid

Symptoms: DXBrain CSM queries down, SIG Publications website http://sigpubs.biostr.washington.edu/ is down
  • Location: VM on sigvm11, one of the Dell PowerEdge r610 servers in the rack
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: WIX CSM depends on NFS at startup

lamina

Symptoms: SIG wiki website http://trac.biostr.washington.edu/ is down, SVN is down
  • Location: VM on sigvm11, one of the Dell PowerEdge r610 servers in the rack
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: WIX CSM depends on NFS at startup

incus

Symptoms: FMA live server is down, nagios monitoring system not alerting
  • Location: VM on sigvm2, one of the Dell OptiPlex in T170b
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup

thorax

Symptoms: SIG backups, DaAtlases are down
  • Location: F500f, one of the Dell PowerEdge 2950 servers in the rack; clearly labeled "thorax"
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup