Changes between Version 9 and Version 10 of DisasterRecovery


Ignore:
Timestamp:
01/31/12 15:04:44 (7 years ago)
Author:
joshuadf
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DisasterRecovery

    v9 v10  
    1 For Windows-related problems, check WindowsDisasterRecovery 
     1Last updated 2012 Jan 31. 
    22 
    33== Prerequisites for our network == 
     
    1010 
    1111== Our locations == 
     12 
    1213Note that some areas may be OK while others are out: 
    1314  * 128.95.44 subnet - Biostr dept, Jim, and Bill's offices 
    14   * 128.208.161 subnet - server room 
     15  * 128.208.161 subnet - server room; also has its own switch in rack which may need to be reset 
    1516  * 128.95.228 subnet - T170B/165 
    1617 
    1718Wireless is on a completely separate UW subnet, 69.91.128.0/255.255.128.0, and uses its own DNS server. 
    1819 
     20== Virtual Machine Hosts == 
     21 
     22Most SIG servers are now virtual machines (VMs) running on a few VMwareEsxi hosts. Please see the section "Connecting to ESXi for the first time" for information on installing the Windows GUI client to connect to the ESXi hosts. You can view the machine console from the GUI client, but not from sitting at the VM host server itself. 
    1923 
    2024== Critical SIG Linux Servers == 
     
    2226=== uvula === 
    2327{{{ 
    24 Symptoms: http://sig.biostr.washington.edu/ website down, NFS data down (`ls /usr/local/data/` hangs), CVS down, SIG mailing lists down 
     28Symptoms: http://sig.biostr.washington.edu/ website down, CVS down, SIG mailing lists down 
    2529}}} 
    2630 * Location: '''T170b''', large Dell Precision 530 back right corner from the door 
    27  * What to do: check console for obvious messages, check network connectivity, reboot (unfortunately sometimes the best way to clear NFS hangs) 
     31 * What to do: check console for obvious messages, check network connectivity, reboot 
     32 
     33---- 
     34=== vagal === 
     35{{{ 
     36Symptoms: NFS data down (`ls /usr/local/data/` hangs) 
     37}}} 
     38 * Location: '''VM on sigvm1''', one of the Dell Power``Edge 2950 servers in the rack 
     39 * What to do: see NfsTroubleshooting, possibly reboot (unfortunately sometimes the best way to clear NFS hangs) 
    2840 
    2941---- 
    3042=== axon === 
    3143{{{ 
    32 Symptoms: main LDAP server down (if both LDAP servers are down, no one can log in), sig-imap server down 
     44Symptoms: main LDAP server down (if both LDAP servers are down, no one can log in), sig-imap server down, ls /home/someuser hangs 
    3345}}} 
    3446 * Location: '''T170b'''; Dell OptiPlex 755 near Joshua's desk, clearly labeled "axon" 
    35  * What to do: reboot, if that does not work see LdapTroubleshooting 
     47 * What to do: reboot, if that does not work see LdapTroubleshooting, NfsTroubleshooting 
     48 * Note: other machines get automount information from LDAP about the location of each home 
     49 * For a complete list, use `ldapsearch -xLLL  | grep automountI` 
     50 
    3651 
    3752== Important SIG Linux Servers == 
    3853 
    39 === thorax === 
     54---- 
     55=== deltoid === 
    4056{{{ 
    41 Symptoms: SIG backups, DaAtlases are down 
     57Symptoms: FME down, secondary LDAP server down (if both LDAP servers are down, no one can log in)  
    4258}}} 
    43  * Location: '''Harris 321''', one of the Dell Power``Edge 2950 servers in the rack; clearly labeled "thorax" 
     59 * Location: '''VM on sigvm12''', one of the Dell Power``Edge r610 servers in the rack 
     60 * What to do: check console for obvious messages, check network connectivity, reboot 
     61 * Note: FME depends on vagal NFS at startup, and takes about 15 minutes to build out its trees 
     62---- 
     63=== xiphoid === 
     64{{{ 
     65Symptoms: QueryManager down 
     66}}} 
     67 * Location: '''VM on sigvm12''', one of the Dell Power``Edge r610 servers in the rack 
    4468 * What to do: check console for obvious messages, check network connectivity, reboot 
    4569 * Note: this is a production server and does not depend on LDAP or NFS for startup 
     
    4973Symptoms: DXBrain down, various Wirm repos are down (bmap, eyelab, celo) 
    5074}}} 
    51  * Location: '''Harris 321''', one of the Dell Power``Edge 1750 servers in the rack; clearly labeled "sphenoid" 
     75 * Location: '''VM on sigvm12''', one of the Dell Power``Edge r610 servers in the rack 
    5276 * What to do: check console for obvious messages, check network connectivity, reboot 
    5377 * Note: this is a production server and does not depend on LDAP or NFS for startup 
     
    5579=== cuboid === 
    5680{{{ 
    57 Symptoms: DXBrain CSM queries down, SIG Publications website is down 
     81Symptoms: DXBrain CSM queries down, SIG Publications website http://sigpubs.biostr.washington.edu/ is down 
    5882}}} 
    59  * Location: '''Harris 321''', one of the Dell Power``Edge 1750 servers in the rack; clearly labeled "cuboid" 
     83 * Location: '''VM on sigvm11''', one of the Dell Power``Edge r610 servers in the rack 
    6084 * What to do: check console for obvious messages, check network connectivity, reboot 
    6185 * Note: WIX CSM depends on NFS at startup 
    6286---- 
    63 === xiphoid === 
     87=== lamina === 
    6488{{{ 
    65 Symptoms: FME down, FMA OWL queries don't work 
     89Symptoms: SIG wiki website http://trac.biostr.washington.edu/ is down, SVN is down 
    6690}}} 
    67  * Location: '''Harris 321''', one of the Dell Power``Edge 1750 servers in the rack; clearly labeled "xiphoid" 
     91 * Location: '''VM on sigvm11''', one of the Dell Power``Edge r610 servers in the rack 
    6892 * What to do: check console for obvious messages, check network connectivity, reboot 
    69  * Note: FME depends on NFS at startup 
     93 * Note: WIX CSM depends on NFS at startup 
    7094---- 
    71 === various /home/ exports === 
     95=== incus === 
    7296{{{ 
    73 Symptoms: ls /home/someuser hangs; http://sig.biostr.washington.edu/~someuser/ website down 
     97Symptoms: FMA live server is down, nagios monitoring system not alerting 
    7498}}} 
    75  * Location: T170b; see below and the [http://sig.biostr.washington.edu/local/inventory.html SIG Inventory Database] 
    76  * What to do: see NfsTroubleshooting 
    77  * Some key home directories: 
    78   *`axon:/home/detwiler` (Todd) 
    79   *`stylus:/home/andrew` (brain browser) 
    80   *`axon:/home/brinkley` (Jim's webpage) 
    81  * Note: user home directories are kept on the machine they theoretically use most often (most Windows users use `axon`), and exported via NFS. Other machines get automount information from LDAP about the location of each home 
    82  * For a complete list, see HomesList or use `ldapsearch -xLLL  | grep automountI` 
    83  
     99 * Location: '''VM on sigvm2''', one of the Dell OptiPlex in T170b 
     100 * What to do: check console for obvious messages, check network connectivity, reboot 
     101 * Note: this is a production server and does not depend on LDAP or NFS for startup 
     102---- 
     103=== thorax === 
     104{{{ 
     105Symptoms: SIG backups, DaAtlases are down 
     106}}} 
     107 * Location: '''F500f''', one of the Dell Power``Edge 2950 servers in the rack; clearly labeled "thorax" 
     108 * What to do: check console for obvious messages, check network connectivity, reboot 
     109 * Note: this is a production server and does not depend on LDAP or NFS for startup