Any engineer should be able to be oncall and we encourage all engineers to join the rotation to help distribute the load. Before being added to the oncall rotation, an engineer must have these prerequisites:
- Access to OpsGenie
- Deploy access to production
- SSM access to production
- Join #identity-situation channel and #login-partnerships
OpsGenie Team & Rotations
For OpsGenie access, ask Mo and the devops team.
Team: End User
Your first emergency contact should always be
@login-devops-oncall - Make sure they are aware anytime things are going poorly.
For Login.gov and vendor emergency contact information see Emergency Contacts
The AppDev Rotation hands off every Monday at 12pm Eastern (9am Pacific).
When handing off:
- Update the
@login-appdev-oncallSlack handle to be the new person
The outgoing oncall person should let the incoming person know about any outstanding issues or bugs
Each day, check NewRelic for server and browser errors over the last 24h in
staging (there is a Slack reminder in
#login-appdev for this)
We want to get as many errors fixed as possible, so make sure JIRA tickets are filed all errors in NewRelic. Search JIRA to check that tickets have or haven’t been filed already.
Throughout the week, check for automated vulnerability pull requests and try to get them merged. These links to go GitHub pull request filters, search within these for ones to
Inspector General (IG) Requests
- Check the Guide for responding to IG requests
- Requests will be forwarded via email.
- It is expected that the AppDev who receives the request will be the one to complete it, even if it extends beyond the on-call week.
Expiring PKI Certs
If you see a Slack alert like this, it means that a cert used to verify PIV/CAC cards will expire in 30 days. Check the Federal Public Key Infrastructure Guides Certificate Authorities list for info on the the most up to date certs.
SecOps Incident Response Guide located here
Things to consider when assessing severity:
- If PII is involved
- The environment it is in and status of partner(s) impacted
- Number of users impacted
- Whether the issue is in a primary or secondary flow
Involves an active (launched) partner in Production environment
- High-sev incidents successfully compromise the confidentiality/integrity of Personally Identifiable Information (PII), impact the availability of services for a large number of customers, or have significant financial impact.
- An active (launched) Login.gov partner is reporting that no user can authenticate or proof.
- Required to be addressed immediately and ongoing until resolved.
- Med-sev incidents represent attempts (possibly un- or not-yet-successful) at breaching PII, or those with limited availability/financial impact.
- An active (Launched) Login.gov partner is reporting that some users are not able to authenticate or proof in production.
- A partner is reporting that the sandbox/INT environment is down and no user can authenticate or proof.
- Will be addressed immediately during business hours
- Responders should attempt to consult stakeholders before causing downtime, but may proceed without them if they can’t be contacted in a reasonable time-frame.
- Low-sev incidents don’t affect PII, and have no availability or financial impact. A new partner recently deployed to production is launching their application after hours and reporting that users cannot authenticate or proof. A partner is reporting that some users are not able to authenticate or proof in sandbox/INT
- Responders should avoid service degradation unless stakeholders agree.
- Will be addressed in the normal course of business and prioritized against other Jira issues pending (or potentially added to the backlog for future).
Inspector General (IG) Requests
- Generally expected to be answered in five business days.
- More complicated requests may take longer; expected turnaround should be communicated.
- On occasion, requests are deemed urgent and should be made a priority.
Internal Login.gov on-call guidance
Additional on-call guidance, including time in-lieu is available in the Internal Login.gov on-call guidance Google Doc