wiki:DisasterRecovery

Version 9 (modified by joshuadf, 6 years ago) (diff)

--

For Windows-related problems, check WindowsDisasterRecovery

Prerequisites for our network

  • Power
  • Network connectivity (from UW IT, watch eOutage).
    • uplink to the campus network (u.washington.edu servers) and the Internet is also important for some services (email, web, etc.)
    • CAC NOC phone is 206-221-6000 if network is dead
  • DNS (from department, non-local forwarded to CAC)

Our locations

Note that some areas may be OK while others are out:

  • 128.95.44 subnet - Biostr dept, Jim, and Bill's offices
  • 128.208.161 subnet - server room
  • 128.95.228 subnet - T170B/165

Wireless is on a completely separate UW subnet, 69.91.128.0/255.255.128.0, and uses its own DNS server.

Critical SIG Linux Servers

uvula

Symptoms: http://sig.biostr.washington.edu/ website down, NFS data down (`ls /usr/local/data/` hangs), CVS down, SIG mailing lists down
  • Location: T170b, large Dell Precision 530 back right corner from the door
  • What to do: check console for obvious messages, check network connectivity, reboot (unfortunately sometimes the best way to clear NFS hangs)

axon

Symptoms: main LDAP server down (if both LDAP servers are down, no one can log in), sig-imap server down
  • Location: T170b; Dell OptiPlex 755 near Joshua's desk, clearly labeled "axon"
  • What to do: reboot, if that does not work see LdapTroubleshooting

Important SIG Linux Servers

thorax

Symptoms: SIG backups, DaAtlases are down
  • Location: Harris 321, one of the Dell PowerEdge 2950 servers in the rack; clearly labeled "thorax"
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup

sphenoid

Symptoms: DXBrain down, various Wirm repos are down (bmap, eyelab, celo)
  • Location: Harris 321, one of the Dell PowerEdge 1750 servers in the rack; clearly labeled "sphenoid"
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: this is a production server and does not depend on LDAP or NFS for startup

cuboid

Symptoms: DXBrain CSM queries down, SIG Publications website is down
  • Location: Harris 321, one of the Dell PowerEdge 1750 servers in the rack; clearly labeled "cuboid"
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: WIX CSM depends on NFS at startup

xiphoid

Symptoms: FME down, FMA OWL queries don't work
  • Location: Harris 321, one of the Dell PowerEdge 1750 servers in the rack; clearly labeled "xiphoid"
  • What to do: check console for obvious messages, check network connectivity, reboot
  • Note: FME depends on NFS at startup

various /home/ exports

Symptoms: ls /home/someuser hangs; http://sig.biostr.washington.edu/~someuser/ website down
  • Location: T170b; see below and the SIG Inventory Database
  • What to do: see NfsTroubleshooting
  • Some key home directories: *axon:/home/detwiler (Todd) *stylus:/home/andrew (brain browser) *axon:/home/brinkley (Jim's webpage)
  • Note: user home directories are kept on the machine they theoretically use most often (most Windows users use axon), and exported via NFS. Other machines get automount information from LDAP about the location of each home
  • For a complete list, see HomesList or use ldapsearch -xLLL | grep automountI