The incident response lifecycle
Preparation
Preparation is the only phase where you have unlimited time. Use it.
- Write an Incident Response Plan (IRP): document who is called, what systems get isolated, and who has authority to make decisions.
- Establish your IR team and assign roles (incident commander, communications lead, forensics lead).
- Deploy monitoring and detection tools so you have visibility before an incident starts.
- Define your out-of-band communication channel — a phone tree, a separate messaging platform, or physical meeting location that does not depend on your production systems.
- Run tabletop exercises and drills. Checklists are only useful if people have practiced them.
Detection and identification
You cannot respond to what you cannot see.
- Monitor alerts from your IDS/IPS, SIEM, endpoint agents, and logs.
- Analyze anomalies: unusual login times, lateral movement, unexpected outbound connections, large data transfers.
- Determine whether the event is an actual incident or a false positive.
- Document your initial findings — timestamps, affected systems, indicators of compromise (IOCs).
- Assign a severity category (critical/high/medium/low) to prioritize your response.
Containment
Stop the bleeding without destroying evidence.Short-term containment:
- Isolate affected systems from the network (pull the cable, disable the NIC, move to a quarantine VLAN).
- Preserve volatile evidence first — running processes, open network connections, memory contents — before you isolate or power down.
- Maintain chain of custody: document every action you take and every piece of evidence you collect. Do not simply reboot the server; you will lose memory artifacts and complicate any legal action.
- Apply emergency patches or configuration changes to limit further spread.
- Strengthen access controls on unaffected systems while the investigation continues.
- Implement temporary workarounds that keep the business running at reduced capacity.
Eradication
Remove the threat from your environment completely.
- Identify the root cause: how did the attacker get in, and what did they do once inside?
- Remove all malware, webshells, backdoors, and attacker-created accounts.
- Patch the vulnerability that was exploited.
- Scan for lateral movement — assume the attacker touched more than the initially compromised host.
Recovery
Restore systems to normal operation with confidence.
- Restore from clean backups verified to predate the compromise. Do not restore from a backup that may itself be infected.
- Rebuild systems that cannot be fully cleaned.
- Monitor restored systems closely for signs of reinfection in the days and weeks following recovery.
- Gradually return services to production — do not restore everything at once.
Lessons learned
Turn the incident into a lasting improvement.
- Conduct a post-incident review within two weeks while memory is fresh.
- Document a timeline of events, what worked, and what failed.
- Update your IRP based on gaps discovered during the incident.
- Share findings (at an appropriate level of detail) with relevant teams.
Intrusion Detection Systems (IDS)
An IDS monitors network traffic or system activity for suspicious behavior and alerts administrators when it detects something worth investigating. It is a detective control — it does not stop attacks, but it tells you they are happening.| Type | How it works | Best for |
|---|---|---|
| Network IDS (NIDS) | Monitors traffic flowing across the network by inspecting packets | Detecting scans, known exploit traffic, C2 beacons |
| Host-based IDS (HIDS) | Runs on individual hosts; monitors system calls, file changes, and logs | Detecting rootkits, unauthorized file modifications, privilege escalation |
| Signature-based | Matches traffic/behavior against a database of known attack patterns | High-fidelity detection of known threats; misses novel attacks |
| Anomaly-based | Establishes a baseline of normal behavior and alerts on deviations | Detecting unknown threats; higher false-positive rate |
Recovery objectives: RPO and RTO
When systems go down, two numbers drive your recovery decisions:Recovery Point Objective (RPO)
How much data can you afford to lose?RPO is determined by how frequently you take backups. If you back up every 24 hours and a failure occurs just before the next backup, you lose up to 24 hours of data. For a financial transaction system, that may be unacceptable. For a static marketing site, it may be fine.
Recovery Time Objective (RTO)
How long can your systems be down?RTO is determined by your redundancy architecture and failover speed. Hot standby systems with automatic failover give you an RTO measured in seconds. A manual restore from tape might take hours or days.
Backup resilience: the 3-2-1 rule
Ransomware operators routinely seek out and encrypt backups before deploying their payload. Your backup architecture must survive that scenario.- 3 copies of your data (production plus two backups)
- 2 different storage media types (e.g., NAS plus cloud, or disk plus tape)
- 1 copy offline and physically separated — air-gapped from any network the ransomware could reach
Digital forensics basics
Evidence handling during an incident determines whether you can take legal action later and whether you can fully understand what happened.- Maintain chain of custody — log who collected each piece of evidence, when, and how. Any gap undermines admissibility.
- Create forensic images — copy disks at the bit level (tools:
dd, FTK Imager) before analysis. Work from the copy, not the original. - Capture volatile data first — memory dumps, running process lists, and open network connections disappear when a system powers off.
- Preserve logs — centralize logs to a separate system so an attacker who compromises a host cannot tamper with them.