Incident Response Checklist

This is a quick checklist for any incident (security, privacy, outage, degraded service, etc.) to ensure the team can focus on time critical mitigation/remediation while still communicating appropriately.

Quick Links

Start

There is one checklist per-role starting with the Situation Lead. Find and follow your appropriate role checklist. Checklists are intentionally terse with links to supporting process and information where needed.

  • Situation Lead: Declares incident and facilitates incident response
  • Tech Lead: Focuses on hands on technical response
  • Messenger: Passes information out of the situation room to stakeholders
  • Scribe: Keeps running notes in Slack on what is happening in the situation room
  • Responder: Everyone else in the situation room without an assigned role

These additional roles are external to, and highly engaged with, responders in the situation room:

  • Comms Lead: Login.gov communications lead overseeing crisis communications
  • Envoy: Joins agency partner situation room in case of joint incident and ensures appropriate inter-team coordination
  • Executive On-Call: Designated Login.gov leadership member for escalation and support

Role Checklists

Situation Lead

Initiate

  1. Spins up Situation Room with Google Meet
  2. Calls in additional responders to Situation Room
  3. Calls in Security Engineer to Situation Room
  4. Delegates roles assignments. Triage may continue with unfilled roles, if needed
    • Tech Lead role assigned and focused on technical response
    • Scribe role assigned and taking notes in Slack in situation thread
    • Messenger roles assigned
  5. Declares incident and facilitates incident response using the Slack “Declare Incident Workflow” on #login-situation bookmark bar
  6. Begins situation thread in Slack #login-situation channel

Assess

  1. Keeps Situation Room well controlled
  2. Preforms impact assessment and severity with input from Response Team
  3. GSA-IR briefed when asked

Contain

  1. Determines if containment is required and to what strategy is acceptable.
  2. Makes the decision to return to the Assess phase or move to the Remediate phase
  3. Roles being effectively executed. Adjust/reassign as needed:
    • Too many responders? Let people go
    • Too few responders? Call people in
  4. Cycle responders out (including self) has role clearly transferred. Any responder in room more than 4 hours relieved of role and asked to take a break

Remediate

  1. Verify a recovery plan is ready
  2. Makes decision to return to the Contain phase if additional compromise activity is reported
  3. Spin down incident with input from Tech Lead after system have returned to normal
  4. Close Situation Room and notify #login-situation

Retrospect

  1. Schedule Incident Review within 1 week
  2. Lead the Incident Review
  3. Schedules Lessons Learned - 30 day follow up

Technical Lead

Initiate

  1. Begins technical triage of event
  2. Delegates technical response task to technical team
  3. Collects evidence of incident

Assess

  1. Technical context shared with responders in the room
  2. Determine which Incident Response Runbooks to invoke
  3. Creates parallel lines of investigations delegated to other responders

Contain

  1. Follow Incident Response Runbooks where appropriated, based on the type of event to limit impact.
  2. Delegates to Responders examination of environment task to uncover other areas of potential compromise

Remediate

  1. Verify no additional signs of compromise must be addressed
  2. Implement remediation and recovery plan
  3. Confirm Normal system operation

Retrospect

  • Participate Lessons Learned
  • Provide feedback on technical response

Scribe

Initiate

  1. Triage notes recorded in situation thread
  2. Create Incident Review from template !!! INCIDENT REVIEW TEMPLATE. Place Incident Review document in Postmortems Folder
  3. Add link to the incident document to situation thread

Assess

  1. Provide verbal time check every 30 minutes
  2. Note significant events and findings in situation thread
  3. Ask responders to share evidence and artifacts in the situation thread.

Contain

  • Provide verbal time check every 30 minutes
  • Collects evidence provided from Responders into a single source

Remediate

  • Provide verbal time check every 30 minutes
  • Add the recovery plan to the situation thread
  • Noted in #login-situation when responders have drawn down

Retrospect

  • Construct timeline and complete Incident Review document before lessons learned meeting
  • Attend Incident Review

Messenger

Initiate

  1. For public impacting incidents, post initial incident notice following StatusPage Process
  2. Situation Report (sitrep) ticket created in identity-security-private repo
  3. Create email notice to GSA IR, ISSM, ISSO using the Incident Response - GSA IR Email Template
  4. Once the situation is assessed, ping @login-comms-oncall with brief triage summary

Assess

Contain

Remediate

Retrospect

  • Attend Incident Review

Responder

Initiate

  • Volunteers for unfilled roles
  • Ask where assistance is needed from Tech Lead

Assess

  • Support Tech Lead with parallel tasks as needed
  • Share additional relevant evidence or suggestions when appropriate
  • Ask to leave if you have no actions to take

Contain

  • Monitors environment for additional signs of compromise, based on direction from Tech Lead

Remediate

  • Assists Tech Lead in implementing recovery and remediation plan

Retrospect

  • Participate in the Incident Review if you performed actions during the incident

Comms Lead

  • Notified by the @login-comms-oncall Slack handle; (Target: 30 minutes before crisis comms level reached)
  • Monitors the situation thread
  • If needed, briefly joins situation room to gather context
  • Follows the Login.gov Incident Comms Playbook

Envoy

  • Notified by partner email to Partner Down address
  • Check in with Situation Lead if incident is active
  • Use AWS Incident Manager or phone to pull in responders if a situation has not been declared
  • NOT acting as Login.gov Situation Lead
  • Joins partner situation room (or equivalent)
  • Important status and context communicated between Login.gov and partner situation rooms
  • Can ask for technical resource from Login.gov situation room to join partner room
  • Can not bring partner responders into Login.gov situation room

Executive On-Call

  • Notified by the @login-executive-oncall Slack handle
  • Monitors the situation thread
  • Ensure protection and support of incident responders

Resources

Contact List

Emergency Contact List: Private emergency contact list, includes contact and escalation information for Login.gov, GSA, and vendors.

Declaring an Incident Workflow

In most cases the Declare Incident Slack workflow should be used to initiate and incident. To use:

  1. Enter the #login-situation channel
  2. Either:
    • Type /declare and hit enter to be prompted with a form to enter basic information.

      Screenshot of /declare workflow

    • Select “Declare Incident” from the pinned “Workflows” folder up top

      Screenshot of declare workflow in menus

Early in the response is may be hard to assess impact. The Situation Lead should perform a quick [impact assessment]/articles/incident-response-guide.html#incident-severities) to set the initial impact, and it can be revised as needed later.

The full list of roles may not be known at the time of posting. Leave unassigned roles blank and ensure they are documented in the response Slack thread as they are assigned.

Once posted the team should use a thread under the incident declaration in the channel. This allows for additional threads to be established and multiple sub-incidents to be split off while remaining in the #login-situation channel.

Situation Report

There will be times, particularly in a prolonged outage, where sharing a point in time situation report will be needed. Here is a suggested format with the expectation that it would be posted in Slack and shared with leadership.

Subject: YYYY-MM-DD HH:MM Situation Report for ongoing incident INCIDENT_NAME

* Incident Review Thread: SLACK_THREAD_LINK
* Phase: Initiate|Assess|Remediate|Retrospect
* Severity: High|Med|Low
* Current responders:
  * Situation Lead: NAME
  * Technical Lead: NAME
  * Messenger: NAME
  * Scribe: NAME
* Incident Communications triggered: yes|no

UPDATE NARRATIVE

The UPDATE NARRATIVE is targeted toward leadership. Here are some suggested items to address (with the leader’s view in parenthesis):

  • What is the current state of the incident? (Where are we?)
  • What progress has been made since the last update? (Are we moving?)
  • What is being done now and what might be done next? (Where are we going?)
  • Does the team need support to continue to respond? (What do you need from me?)
  • Is there an estimate of when service might be restored? (ETA?)

The last question is often unanswerable. That is OK! You can always say: “We don’t know right now and we will tell you when we have more information.”