Files
metamask-integration/docs/INCIDENT_RESPONSE.md
2026-03-02 12:14:14 -08:00

3.2 KiB

Incident Response Plan - Smart Accounts

Date: 2026-01-26
Network: ChainID 138 (SMOM-DBIS-138)


Overview

This document outlines the incident response plan for Smart Accounts Kit integration.


Incident Classification

Severity Levels

Critical:

  • Security breach
  • Funds at risk
  • System unavailable

High:

  • Functionality degraded
  • Performance issues
  • Data integrity concerns

Medium:

  • Minor functionality issues
  • Performance degradation
  • User experience issues

Low:

  • Cosmetic issues
  • Documentation issues
  • Non-critical bugs

Response Procedures

Critical Incidents

Immediate Actions:

  1. Pause System: Pause affected contracts if possible
  2. Assess Impact: Determine scope of issue
  3. Notify Team: Alert on-call team
  4. Investigate: Identify root cause
  5. Mitigate: Take immediate mitigation steps

Communication:

  • Notify stakeholders immediately
  • Provide status updates
  • Document all actions

High Priority Incidents

Actions:

  1. Assess Impact: Determine affected users/features
  2. Investigate: Identify root cause
  3. Mitigate: Implement fixes
  4. Monitor: Watch for resolution

Communication:

  • Notify team
  • Update status page
  • Document issue

Common Incidents

Smart Account Creation Fails

Symptoms:

  • Users cannot create accounts
  • High error rate
  • Transaction failures

Response:

  1. Check RPC endpoint status
  2. Check EntryPoint contract
  3. Check AccountFactory contract
  4. Check gas prices
  5. Implement fix

Delegation Not Working

Symptoms:

  • Delegations not granted
  • dApps cannot execute
  • High failure rate

Response:

  1. Check delegation framework
  2. Check permissions
  3. Check expiry times
  4. Review recent changes
  5. Implement fix

Performance Degradation

Symptoms:

  • Slow account creation
  • High gas usage
  • Timeout errors

Response:

  1. Check network status
  2. Check RPC performance
  3. Review recent changes
  4. Optimize if needed
  5. Scale infrastructure

Escalation Path

Level 1: On-Call Engineer

  • Initial response
  • Basic troubleshooting
  • Escalate if needed

Level 2: Senior Engineer

  • Complex issues
  • Architecture decisions
  • Escalate if needed

Level 3: Engineering Lead

  • Critical issues
  • Strategic decisions
  • External coordination

Post-Incident

Incident Review

  1. Root Cause Analysis: Identify root cause
  2. Timeline: Document incident timeline
  3. Impact Assessment: Assess impact
  4. Lessons Learned: Document learnings
  5. Action Items: Create improvement tasks

Documentation

  • Incident report
  • Root cause analysis
  • Action items
  • Timeline

Prevention

Monitoring

  • Set up alerts
  • Monitor metrics
  • Track errors
  • Review logs regularly

Testing

  • Regular testing
  • Load testing
  • Security testing
  • Integration testing

Documentation

  • Keep documentation updated
  • Document procedures
  • Maintain runbooks
  • Review regularly

Resources


Last Updated: 2026-01-26