Incident Response Checklist

This is a quick checklist for any incident (security, privacy, outage, degraded service, etc.) to ensure the team can focus on time critical mitigation/remediation while still communicating appropriately.

This is a checklist/overview document!
For detailed information see the Security Incident Response Guide

Checklist

Initiate

  • Incident declared in #login-situation
  • Situation Lead and team assemble in War Room (See the Topic in #login-situation channel for the link)
  • Situation Lead asks for more participants if needed:
    • During business hours:
      • Call in on-call members using the @login-appdev-oncall and @login-devops-oncall handles in Slack
      • Use @here in #login-situation if still understaffed
    • After hours:
      • Slack or OpsGenie used to alert additional responders (See Emergency Contacts if needed)
  • Roles assigned and duties started:
    • Situation Lead (SL): - Responsible for ensuring all following steps are completed
    • Scribe (SC): Notes significant events observed in the war room (hangout) to #login-situation to produce timeline / share with others not in room (Just notes - Not a transcript!)
    • Technical Lead (TL): Leads technical investigation and mitigation
    • Comms Lead (CL): Coordinates communication outside of #login-situation, within GSA, and if needed, with partners and the public

Assess

  • Incident confirmed
    • System security potentially compromised
    • System unavailable or functionality degraded
    • System under significant active attack from outside or inside threat
    • System integrity in question
  • Severity assigned (can be changed later as new information is collected)
    • High: Confirmed PII breach, confirmed security penetration, complete outage
    • Medium: Suspected PII breach, suspected security penetration, partial outage
    • Low: Suspected attack, outage of non-prod persistent system (int)
  • If user or partner impacting, StatusPage updated
  • Checked Incident Response Runbooks for relevant runbooks to execute
  • If secure shared notepad is needed, Google Doc opened and shared https://drive.google.com/drive/folders/1TWTMp_w55niNuqC7vTPDEe5vkxaiP4P0 (Contents should be copied to official issue)

Remediate

  • For security incidents, consult official policy before destroying ANY evidence! Contain: Detach a compromised instance, do not destroy!

Loop through per-role items until remediation is complete.

By Role

  • Situation Lead (SL)
    • Wellbeing of group monitored, including self (Tired and stressed humans make poor decisions)
    • Rotations of all roles planned and performed to prevent any responder spending more than 3 hours in role
  • Technical Lead (TL)
    • Lead technical response till issue is remediated
    • OR role is handed off
  • Comms Lead (CL)
    • Regular updates to interested parties provided
    • StatusPage updated as status changes
  • Scribe (SC)
    • Ensure a timeline of significant events is recorder in the #login-situation Slack channel
    • Relay information to help someone NOT in the war room who wants to understand the incident

Upon remediation:

  • Signaled end of incident in #login-situation once remediated

Retrospect

  • Postmortem doc started from copy of Postmortem Template
  • Postmortem meeting scheduled with entire incident response team

Resources