Incident Response Checklist

This is a quick checklist for any incident (security, privacy, outage, degraded service, etc.) to ensure the team can focus on time critical mitigation/remediation while still communicating appropriately.

Quick Links

Situation Lead: Situation Room link on #login-situation bookmark bar
Situation Lead: Declare Incident Workflow
Situation Lead: Impact Assessment
Scribe: Copy Incident Review Template
Scribe: Postmortems Folder
Messenger: GSA IR Email Template
Technical Lead: Incident Response Runbooks

Start

There is one checklist per-role starting with the Situation Lead. Find and follow your appropriate role checklist. Checklists are intentionally terse with links to supporting process and information where needed.

Situation Lead: Declares incident and facilitates incident response
Tech Lead: Focuses on hands on technical response
Messenger: Passes information out of the situation room to stakeholders
Scribe: Keeps running notes in Slack on what is happening in the situation room
Responder: Everyone else in the situation room without an assigned role

These additional roles are external to, and highly engaged with, responders in the situation room:

Comms Lead: Login.gov communications lead overseeing crisis communications
Envoy: Joins agency partner situation room in case of joint incident and ensures appropriate inter-team coordination
Executive On-Call: Designated Login.gov leadership member for escalation and support

Role Checklists

Situation Lead

Initiate

Spins up Situation Room with Google Meet
Calls in additional responders to Situation Room
Calls in Security Engineer to Situation Room
Delegates roles assignments. Triage may continue with unfilled roles, if needed
- Tech Lead role assigned and focused on technical response
- Scribe role assigned and taking notes in Slack in situation thread
- Messenger roles assigned
Declares incident and facilitates incident response using the Slack “Declare Incident Workflow” on #login-situation bookmark bar
Begins situation thread in Slack #login-situation channel

Assess

Keeps Situation Room well controlled
Preforms impact assessment and severity with input from Response Team
GSA-IR briefed when asked

Contain

Determines if containment is required and to what strategy is acceptable.
Makes the decision to return to the Assess phase or move to the Remediate phase
Roles being effectively executed. Adjust/reassign as needed:
- Too many responders? Let people go
- Too few responders? Call people in
Cycle responders out (including self) has role clearly transferred. Any responder in room more than 4 hours relieved of role and asked to take a break

Remediate

Verify a recovery plan is ready
Makes decision to return to the Contain phase if additional compromise activity is reported
Spin down incident with input from Tech Lead after system have returned to normal
Close Situation Room and notify #login-situation

Retrospect

Schedule Incident Review within 1 week
Lead the Incident Review
Schedules Lessons Learned - 30 day follow up

Technical Lead

Initiate

Begins technical triage of event
Delegates technical response task to technical team
Collects evidence of incident

Assess

Technical context shared with responders in the room
Determine which Incident Response Runbooks to invoke
Creates parallel lines of investigations delegated to other responders

Contain

Follow Incident Response Runbooks where appropriated, based on the type of event to limit impact.
Delegates to Responders examination of environment task to uncover other areas of potential compromise

Remediate

Verify no additional signs of compromise must be addressed
Implement remediation and recovery plan
Confirm Normal system operation

Retrospect

Participate Lessons Learned
Provide feedback on technical response

Scribe

Initiate

Triage notes recorded in situation thread
Create Incident Review from template !!! INCIDENT REVIEW TEMPLATE. Place Incident Review document in Postmortems Folder
Add link to the incident document to situation thread

Assess

Provide verbal time check every 30 minutes
Note significant events and findings in situation thread
Ask responders to share evidence and artifacts in the situation thread.

Contain

Provide verbal time check every 30 minutes
Collects evidence provided from Responders into a single source

Remediate

Provide verbal time check every 30 minutes
Add the recovery plan to the situation thread
Noted in #login-situation when responders have drawn down

Retrospect

Construct timeline and complete Incident Review document before lessons learned meeting
Attend Incident Review

Messenger

Initiate

For public impacting incidents, post initial incident notice following StatusPage Process
Situation Report (sitrep) ticket created in identity-security-private repo
Create email notice to GSA IR, ISSM, ISSO using the Incident Response - GSA IR Email Template
Once the situation is assessed, ping @login-comms-oncall with brief triage summary

Assess

Check the Incident Comms Playbook every 30 minutes
Update the platform StatusPage (if an incident is posted) every 30 minutes

Contain

Provides update to stakeholders outside of Situation Room
Relays questions from stakeholders to Response Team
(Every 30 Minutes) Check the Incident Comms Playbook - ASSESS section
Provide a situation report during prolonged incidents

Remediate

Update StatusPage with incident end process
Provide all clear update using to GSA IR, ISSO and ISSM
(Every 30 Minutes) Check the Incident Comms Playbook - ASSESS section
Provide a situation report during prolonged incidents

Retrospect

Attend Incident Review

Responder

Initiate

Volunteers for unfilled roles
Ask where assistance is needed from Tech Lead

Assess

Support Tech Lead with parallel tasks as needed
Share additional relevant evidence or suggestions when appropriate
Ask to leave if you have no actions to take

Contain

Monitors environment for additional signs of compromise, based on direction from Tech Lead

Remediate

Assists Tech Lead in implementing recovery and remediation plan

Retrospect

Participate in the Incident Review if you performed actions during the incident

Comms Lead

Notified by the @login-comms-oncall Slack handle; (Target: 30 minutes before crisis comms level reached)
Monitors the situation thread
If needed, briefly joins situation room to gather context
Follows the Login.gov Incident Comms Playbook

Envoy

Notified by partner email to Partner Down address
Check in with Situation Lead if incident is active
Use AWS Incident Manager or phone to pull in responders if a situation has not been declared
NOT acting as Login.gov Situation Lead
Joins partner situation room (or equivalent)
Important status and context communicated between Login.gov and partner situation rooms
Can ask for technical resource from Login.gov situation room to join partner room
Can not bring partner responders into Login.gov situation room

Executive On-Call

Notified by the @login-executive-oncall Slack handle
Monitors the situation thread
Ensure protection and support of incident responders

Resources

Contact List

Emergency Contact List: Private emergency contact list, includes contact and escalation information for Login.gov, GSA, and vendors.

Declaring an Incident Workflow

In most cases the Declare Incident Slack workflow should be used to initiate and incident. To use:

Enter the #login-situation channel
Either:
- Type /declare and hit enter to be prompted with a form to enter basic information.
- Select “Declare Incident” from the pinned “Workflows” folder up top

Early in the response is may be hard to assess impact. The Situation Lead should perform a quick [impact assessment]/articles/incident-response-guide.html#incident-severities) to set the initial impact, and it can be revised as needed later.

The full list of roles may not be known at the time of posting. Leave unassigned roles blank and ensure they are documented in the response Slack thread as they are assigned.

Once posted the team should use a thread under the incident declaration in the channel. This allows for additional threads to be established and multiple sub-incidents to be split off while remaining in the #login-situation channel.

Situation Report

There will be times, particularly in a prolonged outage, where sharing a point in time situation report will be needed. Here is a suggested format with the expectation that it would be posted in Slack and shared with leadership.

Subject: YYYY-MM-DD HH:MM Situation Report for ongoing incident INCIDENT_NAME

* Incident Review Thread: SLACK_THREAD_LINK
* Phase: Initiate|Assess|Remediate|Retrospect
* Severity: High|Med|Low
* Current responders:
  * Situation Lead: NAME
  * Technical Lead: NAME
  * Messenger: NAME
  * Scribe: NAME
* Incident Communications triggered: yes|no

UPDATE NARRATIVE

The UPDATE NARRATIVE is targeted toward leadership. Here are some suggested items to address (with the leader’s view in parenthesis):

What is the current state of the incident? (Where are we?)
What progress has been made since the last update? (Are we moving?)
What is being done now and what might be done next? (Where are we going?)
Does the team need support to continue to respond? (What do you need from me?)
Is there an estimate of when service might be restored? (ETA?)

The last question is often unanswerable. That is OK! You can always say: “We don’t know right now and we will tell you when we have more information.”