Incident Response Checklist
This is a quick checklist for any incident (security, privacy, outage, degraded service, etc.) to ensure the team can focus on time critical mitigation/remediation while still communicating appropriately.
This is a checklist/overview document!
For detailed information see the Security Incident Response Guide
Checklist
Initiate
- Incident declared in #login-situation
- Situation Lead and team assemble in War Room (See the Topic in #login-situation channel for the link)
- Situation Lead asks for more participants if needed:
- During business hours:
- Call in on-call members using the @login-appdev-oncall and @login-devops-oncall handles in Slack
- Use @here in #login-situation if still understaffed
- After hours:
- Slack or OpsGenie used to alert additional responders (See Emergency Contacts if needed)
- During business hours:
- Roles assigned and duties started:
- Situation Lead (SL): - Responsible for ensuring all following steps are completed
- Scribe (SC): Notes significant events observed in the war room (hangout) to #login-situation to produce timeline / share with others not in room (Just notes - Not a transcript!)
- Technical Lead (TL): Leads technical investigation and mitigation
- Checks for relevant Incident Response Runbooks
- Ensures execution of relevant runbook steps, subdelegating as needed
- Comms Lead (CL): Coordinates communication outside of #login-situation, within GSA, and if needed, with partners and the public
- Issue created as official record for incident: Incident Template
- Incident Review document started and shared Incident Review Google Doc
- Used GSA IR Email Template to create and send notice to GSA Incident Response gsa-ir@gsa.gov, IT Service Desk itservicedesk@gsa.gov (or GSA IT Helpline called), and our GSA ISSO and ISSM within 1 hour of start of incident
Assess
- Incident confirmed
- System security potentially compromised
- System unavailable or functionality degraded
- System under significant active attack from outside or inside threat
- System integrity in question
- Severity assigned (can be changed later as new information is collected)
- High: Confirmed PII breach, confirmed security penetration, complete outage
- Medium: Suspected PII breach, suspected security penetration, partial outage
- Low: Suspected attack, outage of non-prod persistent system (
int
)
- If user or partner impacting, StatusPage updated
- Checked Incident Response Runbooks for relevant runbooks to execute
- If secure shared notepad is needed, Google Doc opened and shared https://drive.google.com/drive/folders/1TWTMp_w55niNuqC7vTPDEe5vkxaiP4P0 (Contents should be copied to official issue)
Remediate
- For security incidents, consult official policy before destroying ANY evidence! Contain: Detach a compromised instance, do not destroy!
Loop through per-role items until remediation is complete.
By Role
- Situation Lead (SL)
- Wellbeing of group monitored, including self (Tired and stressed humans make poor decisions)
- Rotations of all roles planned and performed to prevent any responder spending more than 3 hours in role
- Technical Lead (TL)
- Lead technical response till issue is remediated
- OR role is handed off
- Comms Lead (CL)
- Regular updates to interested parties provided
- StatusPage updated as status changes
- Scribe (SC)
- Ensure a timeline of significant events is recorder in the #login-situation Slack channel
- Relay information to help someone NOT in the war room who wants to understand the incident
Upon remediation:
- Signaled end of incident in #login-situation once remediated
Retrospect
- Postmortem doc started from copy of Postmortem Template
- Postmortem meeting scheduled with entire incident response team
Resources
- Login.gov Security Incident Response Guide: IR guidance and overview, defer to the official IR plan
- Official Login.gov Incident Response plan: The authoritative source for login
- TTS incident response process
- GSA IT - IT Security Procedural Guide: Incident Response
- NIST 800-61r2 Computer Security Incident Response Guide