IT Operations Procedures Library
This section is visible to isms-it-staff and isms-security via Scroll Content Manager. It contains the operational procedures that IT Operations staff use to run, maintain, and evidence the infrastructure supporting the ISMS.
Library contents
- OP-01 · Backup and Recovery
- OP-02 · Certificate Management
- OP-03 · Logging and SIEM Management
- OP-04 · Infrastructure Change Management
- OP-05 · BCM and DR Testing
OP-01 · Backup and Recovery
Purpose and scope
This procedure governs the backup, retention, and recovery of all CUI-scope data and systems. It implements NIST 800-171 control 3.8.9 (protect confidentiality of backup CUI), ISO 27001 Annex A 8.13 (information backup), and the organisation's business continuity obligations for CUI-scope systems.
Scope: all systems, applications, and data classified at Restricted or above. CUI-scope backup jobs are subject to the stricter requirements throughout this procedure. Non-CUI systems follow the same procedure but may use relaxed encryption and retention settings where noted.
Evidence generated: EV-D27 (daily backup logs), EV-D28 (quarterly restoration test records).
Backup architecture — the 3-2-1 standard
Three copies of all CUI data must exist at all times:
Copy 1 — Production (primary):
Location: production storage (SAN / NAS / cloud-native storage)
Encryption: at-rest encryption via KMS (AES-256, HSM-backed keys)
Purpose: active working copy
Copy 2 — Local backup (secondary):
Location: on-premises backup appliance in a different room from the
server — must NOT be in the same fire zone as production storage
Encryption: AES-256, client-side before write — key in PAM vault
Retention: 90 days rolling
Purpose: fast local recovery for common failure scenarios
Copy 3 — Offsite backup (tertiary):
Location: geographically separate cloud storage (UK or EEA region)
OR physical encrypted media in approved offsite vault
Encryption: AES-256 client-side before upload — key NEVER held by
cloud provider; key in PAM vault (separate key from Copy 2)
Retention: 12 months (full), extended to 7 years for archive tier
Purpose: recovery from site-level disaster; satisfies DFARS
and DEFSTAN data preservation requirements
Verification:
All three copies confirmed at daily job completion check (EV-D27)
If Copy 3 is unavailable for >24 hours: escalate to IT Manager
If Copy 3 is unavailable for >72 hours: escalate to CISO —
may require interim risk acceptance documentation
Backup jobs — configuration and schedule
Production data backup jobs
JOB: CUI-FileServer-Daily
Source: \\[fileserver]\CUI-Share (all subdirectories)
Schedule: Daily 01:00 UTC
Type: Incremental (Monday–Saturday), Full (Sunday)
Retention:
Daily incrementals: 90 days
Weekly fulls: 12 months
Monthly fulls (last Sunday of month): 7 years
Encryption: AES-256, client-side
Backup software: [product name — e.g. Veeam / Commvault / Azure Backup]
Key: stored in PAM vault under safe "Backup-Keys-CUI"
Target:
Primary: [local backup appliance name/IP]
Copy job: [cloud storage account / bucket — UK region]
JOB: CUI-DatabaseServer-Daily
Source: [database server] — all CUI-scope databases
Schedule: Daily 00:30 UTC (before file server to avoid contention)
Type: Full database backup + transaction log backup every 15 minutes
Retention:
Full backups: 90 days local, 12 months cloud
Transaction logs: 30 days (enables point-in-time recovery)
Encryption: Database-native TDE for local, then backup encryption for copy
Target: same as CUI-FileServer-Daily
JOB: CUI-VMs-Daily
Source: all VMs in [hypervisor cluster] tagged CUI-scope
Schedule: Daily 02:00 UTC
Type: Changed block tracking incremental; weekly full image backup
Retention: 90 days (image); weekly fulls 12 months
Encryption: AES-256 client-side
Target: same as above
JOB: SaaS-M365-Weekly
Source: Exchange Online mailboxes, SharePoint sites, OneDrive
(CUI-labelled content only via label filter)
Schedule: Weekly Sunday 03:00 UTC
Type: Full export via approved M365 backup tool
Retention: 12 months
Encryption: client-side via backup tool before cloud storage
Note: M365 native retention policies are NOT a substitute for
this backup job — they do not provide point-in-time
recovery or immutable protection
Target: dedicated backup cloud account (separate from production M365)
System image and configuration backups
JOB: NetworkDeviceConfig-Daily
Source: all CUI-scope firewalls, routers, switches
Schedule: Daily 04:00 UTC
Type: automated configuration export via vendor API
Tool: [Oxidized / RANCID / vendor NMS — specify deployed product]
Storage: Git repository in configuration management platform (GitLab/ADO)
(configuration repo has its own backup to cloud storage)
Diff alerts: SIEM alert if configuration diff detected between daily exports
(unexpected change indicator)
Retention: full git history (perpetual)
JOB: CloudInfraState-Daily
Source: Terraform state files; AWS CloudFormation stacks;
Azure ARM deployment states
Schedule: Daily 05:00 UTC
Type: state file export to versioned S3 bucket / Azure Blob with
versioning enabled
Retention: 12 months (via object versioning lifecycle policy)
JOB: PKI-CertAuthority-Weekly
Source: internal CA database; issued certificate records; CRL
Schedule: Weekly Sunday 05:30 UTC
Type: full CA database export
Encryption: AES-256 — key held by CISO only (separate from standard backup keys)
Storage: encrypted backup appliance + separate encrypted media in CISO's safe
Retention: permanent (CA compromise history is a legal record)
Note: CA backup tested separately from standard backup restoration test
Backup monitoring — daily operations (generates EV-D27)
Morning checks — daily by IT Operations (08:30 each working day)
Step 1 — Log into backup management console
URL/path: [backup console URL]
Authentication: PAM-mediated (standard account → PAM → backup console admin)
Step 2 — Review overnight job results
Navigate to: Jobs → Last 24 hours → Sort by status
Expected status per job:
CUI-FileServer-Daily: Success
CUI-DatabaseServer-Daily: Success + check transaction log chain is intact
CUI-VMs-Daily: Success
NetworkDeviceConfig-Daily: Success (check git commits in config repo)
Acceptable statuses:
Success: all data backed up; no errors
Success with warnings: job completed but non-critical warnings present
→ Read warning details; document in EV-D27 note field;
resolve warnings within 2 business days if recurring
Unacceptable statuses — act immediately:
Failed: job did not complete
Missed: job did not start
→ See Backup Failure Response below
Step 3 — Verify offsite copy completion
Navigate to: Replication jobs OR check cloud storage console
Confirm: last successful copy to cloud within past 24 hours
If last cloud copy >24 hours ago: escalate to IT Manager
Step 4 — Check storage utilisation
Backup appliance: confirm <80% utilised
Cloud storage: check growth trend
At 80%: create ITSM ticket for storage expansion — complete within 5 days
At 90%: emergency escalation — backup jobs at risk of failure
Step 5 — Document results in EV-D27
EV-D27 is maintained as a running log. Daily entry format:
Date: [YYYY-MM-DD]
Engineer: [name]
Job results:
CUI-FileServer-Daily: [Success / Warning / Failed]
CUI-DatabaseServer-Daily: [Success / Warning / Failed]
CUI-VMs-Daily: [Success / Warning / Failed]
NetworkDeviceConfig-Daily: [Success / Warning / Failed]
SaaS-M365-Weekly: [Success / Warning / Failed / Not scheduled today]
Offsite copy: Confirmed [YYYY-MM-DD HH:MM UTC]
Storage utilisation: [appliance %] / [cloud %]
Issues: [None / description of any issue and action taken]
File: EV-D · BCM → Backup Logs → [current month]
SIEM integration for backup monitoring
Configure the following alerts in the SIEM to provide continuous backup monitoring
between morning checks:
Alert: Backup job failure
Source: backup management console → syslog to SIEM
Trigger: any job status = Failed or Missed
Severity: High
Response SLA: 2 hours (within business hours), 4 hours (out of hours)
On-call: IT Operations on-call is notified for out-of-hours failures
Alert: Backup console authentication anomaly
Source: backup console audit log → SIEM
Trigger: login failure on backup console OR login from unexpected account
Severity: High
Rationale: backup infrastructure is a high-value target;
ransomware groups specifically target backup consoles
Alert: Cloud storage access from unexpected source
Source: cloud storage access logs → SIEM
Trigger: read or write operation on backup storage bucket/container
from any IP not in the approved backup agent IP list
Severity: Critical
Rationale: cloud backup bucket access by an unknown entity may indicate
ransomware exfiltration or attacker reconnaissance
Backup failure response procedure
When a CUI-scope backup job fails or is missed:
Severity assessment:
Single job, single day, isolated failure → Significant (P3)
Multiple jobs failing → Major (P2)
Multiple jobs failing + no successful offsite copy for >24 hours → Critical (P1)
Evidence of ransomware or attacker activity in backup infrastructure → Critical (P1)
P3 response (within 2 hours during business hours):
1. Check backup agent connectivity on source server
ping [source server] from backup appliance
Test-Connection -ComputerName [source] -Count 4
2. Check backup agent service status on source server
Get-Service -Name [backup agent service name] -ComputerName [source]
If stopped: Start-Service or trigger via backup console → retry job
3. Check available space on source and target
If space issue: alert storage expansion ITSM ticket
4. Check backup agent logs on source server:
Event Viewer → Applications and Services → [backup agent log]
Look for: permissions errors, connectivity timeouts, VSS errors
5. Retry the failed job manually
Monitor to completion
6. If retry fails: escalate to IT Manager
7. Document: what failed, what was investigated, resolution, retry result
Add note to EV-D27 daily log entry
P1/P2 response:
Immediately notify CISO
Assess whether backup data integrity is intact (is existing backup data accessible?)
If ransomware suspected: invoke AT-IR incident response
If backup data appears intact but jobs are failing:
implement manual backup via alternative method until automated system is restored
Document as incident in EV-D12
Restoration testing procedure (generates EV-D28)
Quarterly restoration test
Restoration testing is mandatory quarterly. The test must restore actual CUI data from the offsite (cloud) backup — not the local backup — to confirm the most critical recovery path works.
Test schedule:
Q1: January — restore file server data (sample of CUI files)
Q2: April — restore database (point-in-time recovery test)
Q3: July — restore VM (CUI application server from image backup)
Q4: October — restore from cloud (full offsite restoration exercise)
Pre-test checklist:
[ ] Confirm test environment is available and isolated from production
(Restoration target must NOT be a production system)
[ ] Confirm PAM access to backup console for the test engineer
[ ] Confirm IT Manager is available for sign-off after test
[ ] Confirm CISO is aware test is occurring (notification, not approval)
[ ] Record start date/time in EV-D28
Test environment requirements:
A separate environment must be available for restoration:
For file restoration: a non-production file share or test VM file system
For database: a non-production SQL/Postgres instance
For VM: a non-production hypervisor host or cloud sandbox subscription
Under no circumstances restore over production data during a test
Cloud sandbox account (separate from production) should be maintained
specifically for DR and restoration testing
Quarterly restoration test — step by step
RESTORATION TEST: File server CUI data (Q1 example)
Step 1 — Select test data
Choose a sample folder from the CUI file server backup:
Minimum size: 1 GB
Must contain actual CUI-classified content (not test files)
Must have been backed up in the target backup run being tested
Record: folder path, approximate size, backup job date being tested
Step 2 — Identify the backup version to restore from
Target: the most recent successful offsite (cloud) backup
Navigate to: backup console → cloud backup → [job name] → restore point
Record: backup run date/time, backup type (incremental/full chain)
Step 3 — Initiate restoration to test environment
In backup console:
Select: cloud restore (not local restore — the offsite path is what's being tested)
Target: test server / test file path (NOT production)
Encryption key: retrieve from PAM vault under "Backup-Keys-CUI"
Start restoration and monitor progress
Note start time and expected duration
If restoration takes >150% of expected duration: investigate before proceeding
Step 4 — Verify data completeness
After restoration completes:
File count check:
Source (from backup metadata): [N] files, [X] GB
Restored: [N] files, [X] GB
Match: Yes / No — if No, identify missing files and investigate
Hash verification (spot check):
Select 5 random files from the restored set
Compare SHA-256 hash of restored file against backup metadata hash:
Get-FileHash [restored file path] -Algorithm SHA256
All 5 must match their backup metadata hashes
If any hash mismatch: data integrity failure — escalate to CISO immediately
Step 5 — Verify data is usable
Open restored CUI documents in appropriate application
Confirm: documents open correctly, content is readable, no corruption visible
For databases: execute a sample query against the restored database
For VMs: boot the restored VM image and confirm services start
Step 6 — Verify decryption
The restoration test also confirms that backup encryption is functioning:
If the data was restored and is readable, the decryption succeeded
Additionally: attempt to access the raw backup file on cloud storage
without the decryption key — confirm the raw file is unreadable binary
Step 7 — Clean up
Delete all restored data from the test environment after verification
Confirm deletion: directory listing confirms empty
Log deletion in EV-D28
Step 8 — Complete EV-D28 record
EV-D28 Quarterly Backup Restoration Test Record — [YYYY-QQ]
Test date: [date]
Test type: [File / Database / VM / Full offsite]
Backup source: [job name]
Backup run tested: [date/time of backup run]
Restore-from location: Cloud / Local (must be Cloud for Q4 test)
Restoration target: [test server / environment name]
Restoration start time: [HH:MM UTC]
Restoration completion time: [HH:MM UTC]
Total duration: [minutes]
Completeness check:
Expected files/GB: [N files, X GB]
Restored files/GB: [N files, X GB]
Match: Yes / No — [if No: investigation outcome]
Hash verification (5 files):
File 1: [filename] — Hash match: Yes / No
File 2: [filename] — Hash match: Yes / No
File 3: [filename] — Hash match: Yes / No
File 4: [filename] — Hash match: Yes / No
File 5: [filename] — Hash match: Yes / No
Data usability confirmed: Yes / No — [notes]
Decryption verification: Pass / Fail
Raw data unreadable without key: Confirmed / Not tested
Test environment cleaned up: Yes — [date/time]
Issues found: None / [description of any issues and resolution]
Tested by: [engineer name and role]
IT Manager review and sign-off: _________________ Date: _________
File at: EV-D · BCM → Restoration Tests → [YYYY-QQ]
Backup key management
Backup encryption key hierarchy:
Master key (HSM-backed):
Stored in: HSM (hardware security module) — cloud KMS with FIPS 140-2 Level 3 HSM
Access: CISO + IT Manager only (dual control for key operations)
Purpose: wraps all data encryption keys (DEKs)
Rotation: annual — requires planned process with CISO approval
Data encryption keys (DEKs) — one per backup tier:
Copy 2 (local backup) DEK:
Stored in: PAM vault, safe "Backup-Keys-CUI"
Access: IT Operations (backup admin role in PAM)
Rotation: annual (aligned with master key rotation)
Copy 3 (offsite/cloud) DEK:
Stored in: PAM vault, safe "Backup-Keys-CUI-Offsite"
Access: IT Manager + CISO only (higher-value: offsite key controls most
sensitive recovery path)
Rotation: annual
Archive DEK (7-year retention tier):
Stored in: PAM vault + physical encrypted backup in CISO's safe
Access: CISO + IT Manager (joint access for archive restoration)
Rotation: 3 years (less frequent to reduce re-encryption burden on archive tier)
Key recovery procedure (if PAM is unavailable):
Emergency decryption keys are stored in a sealed, CISO-signed physical envelope
in the fire safe alongside the break-glass credentials (see FC-03)
Envelope opened only with CISO + IT Manager present
Envelope opening triggers immediate re-sealing with new envelope after key used
Annual key rotation procedure:
1. Generate new DEK in KMS
2. Re-encrypt backup data with new DEK (most backup platforms support in-place re-encryption)
3. Verify backup jobs complete successfully with new key
4. Run a restoration test using the new key
5. Update PAM vault with new key — revoke old key
6. Log in key management procedure: old key ID, new key ID, rotation date, engineer
OP-02 · Certificate Management
Purpose and scope
This procedure governs the lifecycle of all X.509 certificates used by CUI-scope systems — from request and issuance through monitoring, renewal, and revocation. It implements NIST 800-171 3.13.8 (protect CUI in transit), 3.13.10 (key management), and ISO 27001 Annex A 8.24 (use of cryptography).
Evidence generated: EV-D30 (certificate and key inventory).
Note on browser deadlines: as of 2025, major browser vendors (Apple, Google, Mozilla) have announced progressive reduction of maximum TLS certificate validity to 90 days (Apple's TLS ballot targets 47 days by 2027). Certificate management must increasingly be automated — manual renewal of short-lived certificates is not operationally sustainable. ACME automation is mandatory for all public-facing certificates.
Certificate inventory (EV-D30)
The certificate inventory is the authoritative register of all X.509 certificates in use on CUI-scope systems. It is reviewed monthly and triggers renewal workflows at 60-day and 30-day thresholds.
EV-D30 Certificate Inventory — maintained as a live spreadsheet or
dedicated certificate management platform (e.g. Venafi / Certbot tracking /
HashiCorp Vault PKI)
Required fields per certificate entry:
Field Description
──────────────────────────────────────────────────────────────────────
Certificate ID Unique internal reference (CERT-YYYY-NNN)
Common Name (CN) Primary domain name or system identifier
Subject Alternative All SANs — enumerate all; wildcard certs must
Names (SANs) list the wildcard explicitly
Certificate type TLS-public / TLS-internal / Code signing /
S/MIME / Device / SSH host / CA
Issuing CA CA name (Let's Encrypt / DigiCert / internal CA
name + tier)
Serial number Certificate serial from the CA
SHA-256 fingerprint Uniquely identifies this specific cert instance
Issue date Date certificate was issued
Expiry date Date certificate expires — the critical tracking field
Days until expiry Calculated field — drives alert thresholds
Key algorithm RSA-2048 / RSA-3072 / ECDSA P-256 / P-384
Key storage Where the private key is stored (KMS / HSM /
local — with justification for local)
System(s) installed on All systems where this cert is deployed
Auto-renewal Yes (ACME) / No (manual) — with renewal method
Renewal owner Named role responsible for renewal
Last checked Date inventory entry was last verified
Status Active / Expiring soon / Expired / Revoked
FIPS validated Yes / No (required Yes for CUI-scope certs)
Monthly inventory review process
First Monday of each month — IT Operations reviews EV-D30:
Filter 1: Expiry within 60 days → Renewal-amber alert
Action: Confirm renewal is in progress or scheduled
If auto-renewal (ACME): verify automation is working (check renewal logs)
If manual: create ITSM renewal ticket with deadline date
Filter 2: Expiry within 30 days → Renewal-red alert
Action: Escalate to IT Manager; confirm renewal is on track
If renewal is not in progress: treat as high-priority ITSM ticket
Notify service owner that service may be disrupted if not renewed
Filter 3: Expiry within 7 days → Emergency
Action: Notify CISO; implement emergency renewal; prepare for
potential service impact
Emergency renewal procedure: see Emergency Renewal section below
Filter 4: Already expired → Incident
An expired certificate on a production CUI-scope system is a
security incident (the system may be operating insecurely or
be inaccessible)
Create incident record in EV-D12; notify CISO; replace immediately
Filter 5: SHA-1 or MD5 certificates → Immediate action
Any certificate using SHA-1 or MD5 signature algorithm must be
replaced immediately regardless of expiry date
These algorithms are deprecated — a finding in any assessment
Update inventory: after each review, update the "Last checked" field
and update status for any changes
File updated EV-D30 in: EV-D · Cryptography → Certificate Inventory → [YYYY-MM]
Certificate types and issuance procedures
Public-facing TLS certificates — ACME automation (mandatory)
All certificates for internet-accessible endpoints must use ACME automation.
Manual renewal for public-facing certificates is not permitted —
the operational risk of missed renewal at 90-day (and shorter future)
validity is too high.
Recommended ACME client: Certbot (for Linux) / win-acme (for Windows) /
Caddy (if used as reverse proxy) / cert-manager (Kubernetes)
CA: Let's Encrypt (DV certificates, free, widely trusted) for standard web services
DigiCert / Sectigo (OV/EV certificates, required for specific contract contexts)
Note: Let's Encrypt certificates are domain-validated — they prove domain control
but do not verify the organisation's identity. For services where organisation
identity is contractually required (some government portals), use OV certificates.
ACME setup — Linux with Certbot:
Install: apt install certbot python3-certbot-nginx
Initial certificate:
certbot --nginx -d [domain.com] -d [www.domain.com] \
--email [it-ops@organisation.com] \
--agree-tos --no-eff-email
Auto-renewal timer: certbot creates a systemd timer automatically
Verify: systemctl status certbot.timer
Timer fires twice daily — attempts renewal when cert has <30 days remaining
Verify auto-renewal is working:
certbot renew --dry-run
Expected: "Simulating renewal of an existing certificate" — success
SIEM alert for renewal failure:
Monitor /var/log/letsencrypt/letsencrypt.log for "renewal failed"
Ship log to SIEM via rsyslog
Create SIEM alert: renewal failure → High severity → IT Operations alert
ACME setup — Windows with win-acme:
Download: github.com/win-acme/win-acme
Run: wacs.exe --source iis --host [domain.com] --installation iis
Creates Windows Task Scheduler task for auto-renewal
Verify: run task manually, confirm no errors
Hook for SIEM notification:
wacs.exe supports notification scripts on renewal success/failure
Configure failure notification to send syslog event to SIEM
Certificate deployment post-renewal:
Some services require explicit reloading after cert renewal:
nginx: sudo nginx -s reload (add as --deploy-hook in certbot)
Apache: sudo apachectl graceful
IIS: automatically picks up cert if bound correctly — verify binding
Verify deployment:
openssl s_client -connect [domain.com]:443 2>/dev/null | \
openssl x509 -noout -dates
Confirm: notAfter is the newly issued certificate's expiry
Internal TLS certificates — internal CA
For internal services (management interfaces, internal APIs, internal
monitoring systems) that don't require public CA validation:
Internal CA hierarchy:
Root CA (offline — air-gapped):
Validity: 10 years
Key: RSA-4096 stored on FIPS 140-2 HSM
Location: physically secured (CISO's custody)
Used only to sign subordinate CAs — never issues end-entity certs directly
Issuing CA (online — internal network):
Validity: 5 years
Key: RSA-3072 or ECDSA P-384, HSM-backed if available; KMS minimum
Signed by: Root CA (requires bringing Root CA online — planned event)
Issues: end-entity certificates for internal services
CRL/OCSP: must be reachable from all internal systems that validate certs
OCSP URL: http://[internal-ocsp-server]/ocsp
CRL URL: http://[internal-crl-server]/crl/issuing.crl
Requesting an internal certificate:
1. Generate key pair on the target system (key never leaves the system):
# Linux:
openssl genrsa -out [service].key 3072
openssl req -new -key [service].key -out [service].csr \
-subj "/CN=[service.internal.domain]/O=[Org Name]/C=GB"
# Windows (PowerShell):
$cert = New-SelfSignedCertificate -DnsName "[service.internal]" `
-CertStoreLocation cert:\LocalMachine\My -NotAfter (Get-Date).AddDays(365)
$csr = ... (use certreq for proper CSR generation)
2. Submit CSR to internal CA:
# If using Microsoft AD CS:
certreq -submit -config "[CA-server]\[CA-name]" [service].csr [service].cer
# If using CFSSL / Step CA / Vault PKI:
[tool-specific command]
3. Install signed certificate on target system
4. Register in EV-D30 certificate inventory:
Add entry with all required fields
Set renewal reminder based on validity period
Internal cert validity periods:
Standard services: 365 days (1 year)
High-churn environments (frequently rebuilt): 90 days with ACME automation
Root CA: 3650 days (10 years)
Issuing CA: 1825 days (5 years)
SSH host certificates (from SSH CA): 365 days
Short-lived SSH user certs (from PAM): 8 hours
Code signing certificates
Code signing is used to sign scripts and binaries deployed internally,
satisfying the software restriction policies in AT-CM.
Certificate type: Code signing certificate from internal CA
OR from commercial CA (if external distribution required)
Key storage: offline HSM (USB hardware token — YubiKey or nShield)
Custodian: CISO (primary) + IT Manager (secondary)
Key usage: digitalSignature only (not key encipherment)
Validity: 3 years
Archive requirements: keep all code signing certs permanently —
code signed with a cert must be verifiable for
the life of the signed code
Signing procedure:
# Windows (PowerShell — requires signing cert in cert store):
$cert = Get-ChildItem -Path Cert:\CurrentUser\My -CodeSigningCert
Set-AuthenticodeSignature -FilePath [script.ps1] -Certificate $cert
# Verify signature:
Get-AuthenticodeSignature [script.ps1]
Expected: Status = Valid, SignerCertificate = [org CN]
# Linux binary signing (via gpg or sigstore):
[process depends on toolchain — document specific procedure for each
signed software type]
Timestamp countersignature:
Always use a trusted timestamp authority when signing code:
-TimestampServer http://timestamp.digicert.com
Reason: timestamped signatures remain valid after the signing cert expires
Certificate revocation procedure
When to revoke a certificate immediately (within 4 hours):
- Private key is compromised or suspected compromised
- Certificate was issued in error (wrong CN, wrong organisation)
- System the certificate is installed on has been decommissioned
- Security incident where private key may have been exposed
Revocation process:
For Let's Encrypt / ACME certificates:
certbot revoke --cert-path /etc/letsencrypt/live/[domain]/cert.pem \
--reason keyCompromise
Let's Encrypt revokes within minutes; browsers check OCSP in real-time
For internal CA certificates:
1. Access the CA management interface
2. Find the certificate by serial number
3. Revoke with reason code (keyCompromise / affiliationChanged /
superseded / cessationOfOperation)
4. Update CRL immediately: force CRL publication rather than waiting
for scheduled publication
5. Verify: openssl crl -in [CRL file] -text | grep [serial number]
Confirm the serial appears in the CRL with the correct revocation date
For device certificates (MDM-enrolled):
Intune: Devices → [device] → Revoke certificate
Jamf: Computers → [computer] → Management → Revoke certificate
The revoked certificate will be removed and re-issued on next MDM check-in
After revocation:
Update EV-D30: change status to "Revoked", add revocation date and reason
If key was compromised: log as security incident in EV-D12
Issue replacement certificate immediately (do not leave service without valid cert)
Emergency certificate renewal
When a certificate expiry is discovered with less than 7 days remaining
and the standard renewal process cannot complete in time:
Step 1 — Assess impact
Is this certificate currently causing service disruption (expiry in the past)?
Which systems and users are affected?
Is this a CUI-scope service?
Step 2 — Notify
Immediately notify IT Manager
If CUI service is disrupted: notify CISO within 1 hour
If public-facing service causing customer impact: notify via incident process
Step 3 — Emergency issuance
For ACME-managed certs that have failed auto-renewal:
certbot renew --force-renewal --cert-name [certname]
Investigate why auto-renewal failed after emergency renewal is complete
For manually managed certs expiring within 24 hours:
Use 90-day Let's Encrypt cert as emergency replacement while
longer-validity cert is properly procured and installed
For internal CA certs:
Use emergency CA operation procedure — bring signing CA online,
issue 90-day emergency cert, install, then issue proper 365-day cert
via normal process within 7 days
Step 4 — Install and verify
Install replacement certificate
Verify service is accessible:
curl -I https://[service-url] → should return 200
openssl s_client -connect [service]:443 2>/dev/null |
openssl x509 -noout -dates → confirm new expiry date
Step 5 — Post-incident review
Identify why the certificate expiry was not caught by the inventory process
Create corrective action in EV-A03
Confirm EV-D30 is updated with the new certificate details
Update auto-renewal automation if the failure was automation-related
OP-03 · Logging and SIEM Management
Purpose and scope
This procedure governs the operation, maintenance, and health monitoring of the SIEM platform and the log collection infrastructure. It implements NIST 800-171 controls 3.3.1 through 3.3.9 (Audit and Accountability family) and ISO 27001 Annex A 8.15 (logging), 8.16 (monitoring), and 8.17 (clock synchronisation).
Evidence generated: EV-F01 (monthly log review), EV-F06 (SIEM health report).
SIEM platform operational overview
SIEM platform: [product name — e.g. Microsoft Sentinel / Splunk / IBM QRadar /
Elastic SIEM — specify deployed product]
SIEM location: [cloud-hosted / on-premises]
SIEM admin console: [URL or access path]
SIEM admin access: PAM-mediated — privileged account checkout required
Maximum 2 SIEM admin accounts: CISO + IT Manager
SIEM analyst access: [role-based access — see RBAC configuration below]
Retention tiers:
Hot (searchable online): 90 days
Warm (retrievable within 24 hours): days 91–365
Archive (retrievable within 72 hours): days 366–1095 (36 months total)
Retention enforcement: automated tiering configured in SIEM platform
Do not manually delete log data
Do not modify retention settings without CISO approval and RFC
Legal hold: certain log segments can be set to legal hold —
exempts them from normal retention lifecycle
Log source management
Adding a new log source
Every new CUI-scope system must be added to the SIEM as a log source within 5 business days of deployment. Failure to add a system is a gap against NIST 800-171 3.3.1.
Step 1 — Identify the log source type and forwarding method
Reference: AT-AU Section 3 (log source inventory) for required event categories
Determine forwarding method:
Windows → Windows Event Forwarding (WEF) to Windows Event Collector (WEC)
OR direct SIEM agent installation
Linux → rsyslog + audisp-syslog → SIEM syslog listener
Network devices → syslog to SIEM syslog listener (UDP 514 or TCP 514/6514)
Cloud → native connector (Sentinel Data Connector / Splunk Add-on / etc.)
SaaS (M365, AWS) → platform-native SIEM integration
Step 2 — Configure the source system for log forwarding
Windows via WEF:
On source system (GPO or local policy):
Computer Configuration → Administrative Templates → Windows Components →
Event Forwarding → Configure target Subscription Manager:
Server=[WEC server FQDN], Refresh=60, Port=5985
On WEC server:
wecutil cs [subscription-name].xml (create subscription)
Verify events are arriving: wecutil gs [subscription-name]
WEF Subscription XML minimum content:
<QueryList>
<Query Id="0">
<Select Path="Security">*</Select>
<Select Path="System">*[System[(EventID=7045)]]</Select>
<Select Path="Microsoft-Windows-PowerShell/Operational">
*[System[(EventID=4104)]]
</Select>
</Query>
</QueryList>
Linux via rsyslog:
Edit /etc/rsyslog.d/00-siem.conf:
*.* @@[SIEM-IP]:514 # TCP (recommended)
# Or: *.* @[SIEM-IP]:514 # UDP
Configure audisp to forward auditd events via syslog:
Edit /etc/audisp/plugins.d/syslog.conf:
active = yes
direction = out
path = builtin_syslog
type = builtin
args = LOG_INFO
format = string
Restart: systemctl restart rsyslog auditd
Test: logger "Test SIEM forwarding from [hostname]"
Verify test message appears in SIEM within 60 seconds
Network device (syslog):
Configure syslog on the device (vendor-specific commands):
Example (Cisco IOS):
logging host [SIEM-IP] transport tcp port 514
logging trap informational
logging on
service timestamps log datetime msec
ntp server [NTP-IP] (ensure timestamps are synchronised)
Verify: trigger a known event (interface flap, authentication)
and confirm event appears in SIEM
Step 3 — Create the log source entry in SIEM
In the SIEM admin console:
Add new data source / log source
Configure parser/normalisation (use standard parsers where available)
Set expected event volume (used for health monitoring)
Configure field extraction mappings (IP, user, event type, etc.)
Step 4 — Verify receipt and parsing
After configuration:
Confirm events are arriving: SIEM search for source=[new-system]
Confirm events are parsed correctly: key fields (user, event ID, timestamp)
are populated — not raw strings
If events arrive but are not parsed: check parser configuration;
may need custom field extraction rule
Step 5 — Create health monitoring entry
In SIEM health monitoring configuration:
Add the new source to the "expected sources" list
Set the "silent alert" threshold: 60 minutes for Critical sources;
4 hours for High sources; 24 hours for Medium sources
Reference the log source tier from AT-AU Section 3:
Identity systems (Entra ID, AD) → Critical → 60-minute silence alert
Network boundary (firewall, IDS) → Critical → 60-minute silence alert
Endpoints → High → 4-hour silence alert
Applications → Medium/High per sensitivity
Step 6 — Update the log source inventory in AT-AU Section 3
Add the new source to the table with:
Source name and type
Required event categories
Forwarding mechanism
Expected daily event volume
Priority tier
Step 7 — Update EV-D19 (SIEM configuration baseline)
Add the new source to the baseline document
Record: date added, system name, forwarding method, SIEM data source name
Removing a decommissioned log source
When a CUI-scope system is decommissioned:
1. Decommission the log source in SIEM admin console
(retain historical data — only remove the active collection)
2. Remove from the expected sources health monitoring list
3. Update AT-AU Section 3 log source inventory: mark as decommissioned with date
4. Update EV-D19 baseline
5. Retain historical log data for the full retention period
(a decommissioned system's logs remain part of the audit trail)
NTP synchronisation management
SIEM log correlation depends on all sources having synchronised clocks. A 5-minute drift on a Windows system causes Kerberos authentication to fail — an automatic enforcement of clock sync. But smaller drift (seconds to minutes) still corrupts log correlation and creates investigation timeline ambiguities.
NTP hierarchy:
Stratum 0 (reference clocks):
UK Stratum 1 sources (do not contact directly):
time.google.com
ntp.pool.org (uk.pool.ntp.org)
Stratum 1 (organisational NTP server — contacts Stratum 0):
Server: [internal NTP server hostname / IP]
Implementation: ntpd or chrony on a dedicated or shared Linux server
Authentication: NTP symmetric key or NTPsec (prevents NTP spoofing)
chrony configuration (/etc/chrony.conf):
server time.google.com iburst prefer
server uk.pool.ntp.org iburst
allow [internal network CIDR] # Allow internal clients
local stratum 10 # Serve time even if not synced
keyfile /etc/chrony.keys
log measurements statistics tracking
Stratum 2 (all CUI-scope systems — sync from internal NTP server):
Windows: configure via GPO
Computer Configuration → Administrative Templates → System →
Windows Time Service → Time Providers → Configure Windows NTP Client:
NtpServer: [internal-ntp-server],0x9
Type: NTP
Linux: chrony or ntpd pointing to internal NTP server:
server [internal-ntp-server] iburst
Verify: chronyc tracking
Acceptable offset: <0.1 seconds; Alert threshold: >60 seconds
Network devices: vendor-specific NTP configuration (see BL-NET baseline)
Monitoring for clock drift:
SIEM monitor: if any source timestamp differs from SIEM server clock by
>60 seconds, generate High alert
Windows domain controller check:
w32tm /query /status /computer:[DC-hostname]
Key field: "RMS Offset" — should be <1 second for domain members
Linux check:
chronyc tracking | grep "RMS offset"
chronyc sources -v (shows all peers and their offsets)
The SIEM server itself must be synced:
Its timestamp is the reference for all correlation
SIEM server clock offset should be <100 milliseconds from NTP source
Monitor the SIEM server's NTP status as part of EV-F06 monthly health check
SIEM correlation rule management
Correlation rules (also called detection rules, analytics, or alert rules
depending on the SIEM product) require lifecycle management:
Rule library location:
[SIEM platform] → Analytics → Active Rules
Rules are also version-controlled in the configuration management
repository (GitLab/ADO) — SIEM rule definitions exported as YAML/JSON
and committed to the repo, allowing change tracking and rollback
Rule categories for CUI-scope monitoring:
Category 1 — Credential and identity threats:
MFA approval from new country/device combination (impossible travel)
MFA fatigue: >5 MFA push notifications in 5 minutes to same account
Password spray: >10 failed logins across >10 accounts in 5 minutes from one IP
Admin account login outside business hours without prior notification
Legacy authentication success (should never succeed — CA policy blocks it)
Break-glass account sign-in (any use is a Critical alert)
Category 2 — Endpoint threats:
Windows Event Log cleared (Event ID 1102) — attacker evidence removal
PowerShell script execution with suspicious keywords (Invoke-Expression,
DownloadString, EncodedCommand) — Event ID 4104
New service installed (Event ID 7045) — common malware persistence
Process spawning anomaly (Word/Excel spawning cmd.exe or PowerShell)
USB storage device connected (vendor-specific EDR event)
AV threat detected and NOT remediated within 30 minutes
Category 3 — Network threats:
Data transfer volume anomaly: source sending >5× their 30-day baseline
to an external destination
Connection to CISA KEV-associated IP or domain (threat intel feed integration)
DNS query to DGA-pattern domain (high-entropy domain names)
Firewall rule modification outside maintenance window
New outbound port opened that was not previously used
Category 4 — Audit infrastructure threats:
SIEM log source silent for >60 minutes (Critical sources)
Auditd service stopped on any CUI-scope Linux server
SIEM configuration change (any change to retention, sources, or rule config)
Log volume from a source drops >50% from 7-day baseline (possible suppression)
Category 5 — CUI access anomalies:
CUI file share access after 22:00 or before 06:00
CUI file share access from account not normally accessing CUI
Mass download from CUI file share (>500 files in 10 minutes by one account)
CUI file share access during an active leaver process (account in 90-day hold)
Rule update process:
New rules require CISO approval and a Normal RFC (AT-CM change process)
Changes to existing rules: Normal RFC
Emergency rule creation (during an active incident): Emergency RFC,
retrospective CISO approval within 24 hours
Rule deletion: Normal RFC; rule is archived (exported to git), not deleted,
so it can be re-activated if needed
All rule changes are logged in the SIEM admin audit trail
SIEM admin audit trail itself is monitored for unexpected changes
Monthly log review procedure (generates EV-F01)
The monthly log review is conducted by the Security Analyst in the first week of each month, covering the preceding calendar month.
Target completion: by the 7th of each month
Duration: approximately 2 hours
Conducted by: Security Analyst (primary), CISO (review and sign-off)
STEP 1 — SIEM health check (15 minutes)
A. Open SIEM health dashboard
Confirm all expected log sources show as active
Review any sources that went silent in the past month:
Source name, silence duration, resolution
B. Verify log volume is within expected ranges
Sources with significantly lower volume than the 30-day baseline:
Investigate — low volume may indicate suppression or forwarding issue
Sources with significantly higher volume than baseline:
Note — may be legitimate (security incident, patch activity) or
may indicate log flooding / misconfiguration
C. Check storage utilisation trends
Hot tier: [X]% used — trend from last month
Warm tier: [X]% used
Action if >80%: create ITSM ticket for capacity expansion
STEP 2 — Alert queue review (30–60 minutes)
Open SIEM alert queue: filter to past month
Work through all alerts not yet closed from the previous period
For each alert:
Triage: True Positive / False Positive / Under Investigation
True Positive: link to incident record in EV-D12
False Positive: document why and consider rule tuning
Under Investigation: assign to analyst with target closure date
Summary metrics to record:
Total alerts: [N]
True positives: [N] → incidents raised: [N]
False positives: [N] → rule tuning actions: [N]
Still under investigation: [N]
STEP 3 — Privileged account activity review (20 minutes)
SIEM search: source=AD OR source=EntraID
AND (username like "adm-%" OR role="GlobalAdmin")
AND timeframe=[last month]
Review for:
Admin logins outside business hours not corresponding to a change window
Admin logins from unexpected locations
Admin actions outside their normal system scope
Any use of break-glass accounts (should be zero — any use = escalate to CISO)
Document: "Privileged activity reviewed — [N] events — [Clean / Anomalies found]"
If anomalies found: describe each and document investigation outcome
STEP 4 — CUI system access review (20 minutes)
SIEM search: source=[CUI-fileserver] OR source=[CUI-database]
timeframe=[last month]
Review for:
After-hours access events (outside 07:00–20:00 local time)
Access by accounts not normally accessing CUI (new first-time accesses)
Large file access events (>100 files accessed in one session)
Access denied events (repeated failures may indicate reconnaissance)
Document: "CUI access reviewed — [N] events — [Clean / Anomalies found]"
STEP 5 — Network boundary review (15 minutes)
Firewall summary: total permitted / denied connections (monthly totals)
IDS/IPS: alert count by severity — Critical: [N], High: [N], Medium: [N]
DNS anomalies: NXDOMAIN spike count vs previous month
Web proxy: blocked malicious categories count
Flag for investigation:
IDS Critical alerts that were not escalated to the incident process
DNS NXDOMAIN count >200% of prior month baseline
Document: "Network boundary reviewed — [summary stats] — [Clean / Anomalies]"
STEP 6 — Authentication anomaly review (15 minutes)
SIEM search: authentication failures > [threshold] for any account
Failed login volume: total by system
Account lockout events: list accounts locked more than once in the month
MFA anomalies: denied notifications (MFA fatigue indicators)
Successful logins from new countries or devices
Document: "Authentication reviewed — [summary] — [Clean / Anomalies]"
STEP 7 — Sign off EV-F01
EV-F01 Monthly SIEM Log Review Record — [YYYY-MM]
Review period: [first date] to [last date of month]
Reviewed by: [Security Analyst name]
Review date: [date completed]
1. SIEM Health
Log sources active: [N of N expected]
Log sources with gaps: [list or "None"]
Storage: Hot [%] / Warm [%]
Issues: [None / description]
2. Alert Queue
Total alerts in period: [N]
True positives: [N] — Incidents raised: [list incident IDs or "None"]
False positives: [N] — Rule tuning actions: [list or "None"]
3. Privileged Account Activity
Events reviewed: [N]
Assessment: Clean / Anomalies found
Anomalies: [description or "None"]
4. CUI Access Activity
Events reviewed: [N]
Assessment: Clean / Anomalies found
Anomalies: [description or "None"]
5. Network Boundary
Assessment: Clean / Anomalies found
IDS/IPS summary: Critical [N], High [N], Medium [N]
6. Authentication
Assessment: Clean / Anomalies found
Lockouts: [accounts locked more than once]
Overall assessment: Normal / Anomalies identified (all investigated) /
Incidents opened (see above)
Security Analyst sign-off: _________________ Date: _________
CISO review: _________________ Date: _________
File at: EV-F · Continuous Monitoring → Log Reviews → [YYYY-MM]
Monthly SIEM health report (generates EV-F06)
EV-F06 is distinct from EV-F01. EV-F01 records what was found in the logs. EV-F06 records whether the audit infrastructure itself is functioning correctly.
EV-F06 SIEM Health Report — [YYYY-MM]
Produced by: IT Manager
Target completion: by the 10th of each month
Section 1 — Log source inventory status
For each source in the log source inventory (AT-AU Section 3):
| Source name | Type | Expected daily vol | Actual daily vol | Last event | Status |
Status values:
Active: events received within expected window
Degraded: events received but volume significantly below expected
Gap: no events received for [N] hours/days during the month
(If Gap: note gap duration and resolution)
Section 2 — Retention verification
Sample retrieval test — monthly:
Randomly select a date between 9 and 11 months ago
Search for events from that date: source=[sample source]
timeframe=[that specific date]
Confirm events are retrievable:
Test date: [date]
Events found: Yes / No
Retrieval time: [seconds]
Result: Pass / Fail
Section 3 — Storage capacity
Hot tier: [X]% of [total capacity]
Warm tier: [X]% of [total capacity]
Growth trend: [±X]% month-over-month
Projected time to 80% (hot tier): [N] months at current growth rate
Section 4 — Tamper evidence check
SIEM admin audit trail review:
Configuration changes in the past month: [N]
Each change: [date, what changed, who made the change]
Unexpected changes: None / [description and investigation]
Log deletion events: [N — expected: 0, unless automated retention lifecycle]
Log integrity: [All hashes verified / Issues found]
SIEM immutability status: Confirmed active / Degraded / Failed
Section 5 — SIEM admin role holders
Current SIEM administrators: [Name 1 — CISO] / [Name 2 — IT Manager]
Change from prior month: None / [description of any change]
(Maximum 2 admin role holders — any change requires CISO approval)
Section 6 — NTP synchronisation status
SIEM server clock offset from NTP: [X ms] — [Pass if <100ms]
Any clock sync failures reported by SIEM during month: None / [description]
AD/DC NTP offset sample:
[DC-01]: [X ms] — Pass/Fail
[DC-02]: [X ms] — Pass/Fail
Section 7 — Log integrity verification
Monthly hash verification of archived log segments:
Sample period: [month N-12]
Hashes verified: [N segments]
Hash match: Yes — all segments verified / No — [describe failure]
Section 8 — Sign-off
IT Manager review: _________________ Date: _________
CISO review: _________________ Date: _________
File at: EV-F · Continuous Monitoring → SIEM Health → [YYYY-MM]
OP-04 · Infrastructure Change Management
Purpose and scope
This procedure implements the change management process for all CUI-scope infrastructure changes. It implements NIST 800-171 controls 3.4.3 (track, review, approve, and log changes), 3.4.4 (security impact analysis), and 3.4.5 (access restrictions for change). It aligns with ISO 27001 Annex A 8.32 (change management).
Evidence generated: EV-D21 (change management records).
Change categories — decision guide
Before creating any RFC, determine the correct change category:
STANDARD (pre-approved — no CAB review required):
Pre-defined, low-risk, well-understood changes with documented
implementation steps. The change type itself has been approved by the
CAB — individual instances do not need separate review.
Examples:
OS patching within the defined SLA window via MDM/WSUS
Certificate renewal via ACME automation (no manual steps)
LAPS password rotation (fully automated)
Helpdesk password resets for standard user accounts
Adding a user to an approved, pre-defined access group
Standard change library: maintained in ITSM as a catalogue
If a proposed change is similar to but not exactly a standard change type:
→ Use Normal category. Do not force-fit into Standard.
Evidence: ITSM ticket is created automatically for tracking;
no RFC number needed; logged in EV-D21 as standard change
NORMAL (individual CAB review — 48-hour minimum review period):
Changes that require individual assessment because their specific
characteristics, scope, or risk cannot be fully pre-defined.
Examples:
New GPO setting or modification to existing GPO on CUI-scope OU
New MDM configuration profile deployment
Firewall rule addition or modification
New software deployment to CUI-scope endpoints
Database schema change on CUI-scope database
Network segmentation change
New API integration or external system connection
Baseline update following CIS Benchmark revision
CAB composition: IT Manager (chair) + CISO + Network Engineer +
rotating business representative
CAB meeting: weekly (Wednesday 10:00) or on-demand for urgent items
Approval requirement: CAB chair sign-off minimum; CISO for CUI-impacting changes
Post-implementation review: required within 48 hours
MAJOR (CAB + CISO + senior management notification):
Changes with potential to materially affect the security posture,
the CUI system boundary, or the SSP description.
Examples:
New platform added to CUI scope (new OS type, new cloud service)
Identity provider migration or significant configuration change
Firewall architecture change (new DMZ zone, perimeter redesign)
CUI data flow change (new source or destination of CUI)
Change that requires SSP system boundary update
Decommissioning a CUI-scope system and redistributing its function
Approval: CAB + CISO + CEO/COO notification
Review period: 5 business days minimum
SSP update: required within 30 days if SSP boundary or control description changes
Tested rollback plan: mandatory and tested in staging before production
EMERGENCY (immediate; document within 24 hours):
Changes required to prevent or respond to an active security incident
or critical service outage where the risk of delay exceeds the risk
of bypassing the normal approval process.
Trigger criteria (at least one must apply):
Active security incident requiring immediate containment
CUI-scope service outage affecting contract delivery obligations
Active ransomware spread that change will contain
Approval: on-call IT Manager + CISO verbal approval (document within 24 hours)
Implementation: proceed immediately; document within 24 hours
Retrospective review: mandatory at next CAB meeting
Evidence: emergency change record in EV-D21 within 24 hours of implementation
RFC creation and content requirements
All Normal and Major changes require a Request for Change (RFC)
created in the ITSM platform before implementation.
RFC mandatory fields:
1. RFC title
Short descriptive title: "[System] — [Type of change] — [Date]"
Example: "PRODDB01 — SQL Server 2022 Security Patch — 2024-03-15"
2. Change description
What is being changed, specifically:
- Which system(s)
- What component (OS / application / configuration / network)
- What is the current state
- What will be the new state after the change
3. Business justification
Why is this change needed now?
What is the risk of not making this change?
4. Security Impact Assessment (SIA)
Complete the SIA template embedded in the RFC form.
For configuration changes, include:
a) CUI scope impact: Does this change affect a system that stores,
processes, or transmits CUI? If Yes: which CUI categories?
b) Security controls affected: Which NIST 800-171 controls does
this change affect (add, remove, or modify)?
Reference the specific AT-[family] page if relevant.
c) New attack surface: What new connectivity, service, account,
or data flow does this change introduce?
d) Failure risk: What happens if the change fails mid-implementation?
Is partial completion more dangerous than either original state
or target state?
e) Dependencies: What upstream and downstream systems could be
affected by this change?
f) Testing: Has this been tested in a non-production environment?
If Yes: describe test environment and outcomes.
If No: justify why testing is not feasible.
g) Post-change monitoring: What should be monitored in the 24 hours
after implementation to detect adverse effects?
5. Implementation plan
Step-by-step implementation instructions specific enough that a
different IT Operations engineer could execute the change:
- Commands to run
- Configuration values to set
- Verification steps after each significant step
- Expected output at each step
6. Rollback plan
How will the change be reversed if it fails or causes adverse effects?
- Rollback trigger criteria (what conditions warrant rollback?)
- Rollback steps (specific commands/actions)
- Rollback verification (how do you confirm rollback was successful?)
- Rollback decision maker (who authorises rollback?)
For Major changes: rollback must be tested in staging before production
7. Maintenance window
When will the change be implemented?
What is the maximum duration? (if exceeded → rollback trigger)
Is a service outage expected? If yes: who has been notified?
8. Post-implementation review
When will the post-implementation review occur? (within 48 hours for Normal)
What will be checked? (link to verification steps from implementation plan)
Who will conduct it?
CAB meeting procedure
CAB meetings: every Wednesday 10:00–11:00 (or on-demand for urgent items)
Chair: IT Manager
Attendees: CISO, Network Engineer, rotating business representative,
change requestors for items on the agenda
Agenda for each CAB meeting:
1. Review of emergency changes since last CAB (5 minutes)
- Any emergency changes implemented since last meeting
- Confirm retrospective documentation is complete
- Assess whether emergency change revealed a process gap
2. Post-implementation reviews from previous week (10 minutes)
- Any Normal changes implemented last week
- Were there unexpected effects?
- Is there a follow-up ITSM ticket if issues were found?
3. Upcoming changes — review and approval (main agenda item):
For each RFC submitted for this CAB:
Requestor presents: what, why, SIA summary, rollback plan
CAB questions and challenge
Decision: Approved / Approved with conditions / Deferred / Rejected
Conditions (if applicable): record specific conditions in RFC
4. Forward look — Major changes planned (5 minutes)
Awareness item for upcoming Major changes requiring 5-day review
CAB minutes:
Documented in ITSM as a linked note to each RFC reviewed
Separately filed as a CAB meeting record in EV-D21
Minutes must show: attendees, each RFC reviewed, decision, conditions
Change implementation and evidence (generates EV-D21)
For every Normal or Major change:
Pre-implementation checklist (complete within 1 hour before start):
[ ] RFC status: Approved (not Draft or Under Review)
[ ] Maintenance window confirmed with all affected stakeholders
[ ] Rollback plan reviewed and rollback steps accessible
[ ] Backup/snapshot of affected system taken (where technically feasible):
VM: snapshot via hypervisor before change
Config file: git commit or manual backup of current config
Database schema: schema export before change
[ ] SIEM alert enhancement: notify SIEM team to monitor for
change-related anomalies during and after implementation window
During implementation:
Log each step as it is executed in the RFC implementation log field
Record: timestamp, step taken, observed result, pass/fail
If unexpected result: pause, assess, consult rollback criteria
Do NOT proceed through unexpected results without assessing whether
rollback should be triggered. The definition of rollback trigger
criteria in the RFC is there specifically for this moment.
Post-implementation verification:
Execute the verification steps from the RFC
For each verification check:
Expected result: [from RFC]
Actual result: [observed]
Match: Yes / No
If any verification check fails:
Assess: is this a critical failure (rollback) or a known side-effect (accept)?
If rollback: execute rollback plan; document what failed and why
If accept: document why the failure is acceptable and any follow-up needed
Complete the EV-D21 change record:
EV-D21 Change Management Record — RFC-[YYYY-NNN]
RFC title: [from RFC]
Change category: Standard / Normal / Major / Emergency
Requestor: [name]
Implemented by: [engineer name]
CAB approval: [approver names and date] / Standard change (pre-approved) /
Emergency (verbal approval: [name + date])
Maintenance window: [start date/time] to [end date/time]
Actual implementation: [start] to [end]
SIA reference: [SIA completed: Yes / Not required for Standard]
CUI scope affected: Yes / No — [if Yes: which CUI categories]
Implementation outcome:
[ ] Completed as planned
[ ] Completed with minor deviations (document below)
[ ] Partially completed — follow-up required (document below)
[ ] Rolled back (document below)
Deviations/issues: [description or "None"]
Rollback executed: Yes / No
If Yes: rollback trigger, rollback steps taken, result
Post-implementation verification:
[Verification check 1]: Expected [X] / Actual [X] — Pass/Fail
[Verification check 2]: Expected [X] / Actual [X] — Pass/Fail
[...]
Follow-up ITSM tickets raised: [list or "None"]
SSP update required: Yes (within 30 days) / No
IT Manager review: _________________ Date: _________
File at: EV-D · Config Management → Change Log → [YYYY]
OP-05 · BCM and DR Testing Procedures
Purpose and scope
This procedure governs the testing of the organisation's Business Continuity and Disaster Recovery capabilities. It implements ISO 27001 Annex A 5.29 (information security during disruption), Annex A 5.30 (ICT readiness for business continuity), and NIST 800-171 3.6.3 (test the organisational incident response capability in the context of DR-scale incidents).
Evidence generated: BCM exercise records, DR test records (filed under EV-A and linked from AT-IR EV-D15).
BCM/DR test programme — annual schedule
The following tests are conducted each year. The programme escalates
in complexity from desk-based to full technical exercise:
Q1 — BCP tabletop exercise
Format: facilitated discussion (2 hours)
Scenario: complete loss of primary office (fire / flood / denial of access)
Participants: all department heads, CISO, IT Manager, HR Manager
Focus: manual procedures, communication cascade, staff welfare,
decision-making without IT systems for first 4 hours
Q2 — IT failover test (technical — partial)
Format: planned, controlled failover test during a maintenance window
Scenario: primary server room is unavailable — activate cloud DR environment
Participants: IT Operations team only (business not involved)
Focus: can we fail over to cloud infrastructure? what is the RTO?
Systems tested: DNS failover, identity platform (Entra ID — already cloud),
file server failover, email continuity
Q3 — Full DR exercise (technical + business)
Format: half-day exercise (business working from DR environment)
Scenario: primary infrastructure unavailable — staff working on DR systems
Participants: IT Operations + representative staff from each department
Focus: can the business function on DR infrastructure?
what breaks that we did not expect?
Q4 — Cyber incident tabletop (combined IR + BCM)
Format: facilitated scenario exercise (3 hours)
Scenario: ransomware attack affecting CUI-scope systems
Participants: full IRT + senior management (Executive Sponsor invited)
Focus: incident response + business continuity simultaneously;
external reporting decisions under time pressure
This exercise also satisfies AT-IR 3.6.3 (test IR capability)
Q1 — BCP tabletop exercise procedure
Pre-exercise preparation (CISO — 2 weeks before):
Scenario selection:
Select a realistic scenario relevant to the organisation's location and risk profile
Common scenarios:
Fire in building — staff cannot access premises for 5 days
Major flood — building inaccessible for 2 weeks; some equipment damaged
Extended power failure — 48-hour outage in the building and surrounding area
Key person unavailability — CISO and IT Manager both unavailable for 10 days
Pandemic or public health emergency — all staff working remotely for 30 days
The scenario should NOT be the same as the previous year's tabletop
Inject preparation:
Prepare 8–10 "injects" (new developments that emerge during the exercise)
Example injects for office fire scenario:
T+30 min: Fire service confirms building inaccessible for at least 72 hours
T+2 hours: IT Manager calls to say their laptop was left in the building
T+4 hours: Customer calls asking for update on a time-sensitive contract deliverable
T+24 hours: Insurance company asks for a list of affected assets
T+48 hours: Contract authority (MOD or US government) asks for a formal situation report
T+72 hours: Fire service extends access denial — now estimating 7 days minimum
Injects should force decisions, not just provide information
Participant briefing (1 week before):
Send pre-reading: current BCP summary (key decisions, contact lists, priorities)
Confirm attendance — all department heads required
Set expectations: this is a learning exercise, not a test of performance
Exercise facilitation:
Introduction (10 minutes):
CISO explains the exercise format
Ground rules:
Treat it as real — make the decisions you would actually make
No mobiles / email during exercise (forces reliance on BCP procedures)
All decisions documented by designated scribe
Set the scene: read the initial scenario brief
Exercise phases (90 minutes):
Phase 1 — Initial response (T+0 to T+4 hours simulated time):
How do we account for all staff?
How do we communicate with staff who don't know what happened?
Who contacts the customer? What do we tell them?
Where do people work? Do we have remote working capability for everyone?
Phase 2 — Short-term continuity (T+4 hours to T+48 hours):
What systems do we absolutely need? In what order?
Are all staff able to work remotely with current equipment?
How do we handle staff without company laptops?
What are our contractual obligations we must meet in the next 48 hours?
Phase 3 — Extended disruption (T+48 hours to T+7 days):
Do we need temporary premises?
Do we need to notify regulators or contracting authorities?
What does week 2 look like if the building remains inaccessible?
How do we handle payroll if it falls due during the disruption?
Inject delivery: CISO introduces each inject at appropriate times
Debrief (20 minutes):
What went well?
What decisions were hardest to make and why?
What information did you need that you could not quickly find?
What BCP documentation changes would have helped?
What actions should we take before the next exercise?
Post-exercise (within 10 days):
CISO produces exercise report:
Scenario description and objectives
Participant list
Timeline of decisions made during exercise
Findings — strengths: what worked
Findings — gaps: what did not work or was unclear
Action items: specific, owned, dated
Action items entered in EV-A04 (corrective action register)
BCP document updated where gaps were identified
Exercise report filed and linked from AT-IR EV-D15
Q2 — IT failover test procedure
Pre-test planning
Test objective: verify that CUI-scope systems can fail over to the
cloud DR environment and that recovery time meets the RTO target.
RTO targets (from BCP):
Identity / authentication (Entra ID): already cloud — continuous availability
Email (Exchange Online): already cloud — continuous availability
CUI file server: RTO target [4 hours] — time from declared DR to
staff able to access CUI files via DR environment
CUI database: RTO target [8 hours]
Development environment: RTO target [24 hours] (not critical path)
DR environment description:
Cloud subscription: [DR Azure subscription / AWS account — specify]
This is a separate subscription from production
Pre-provisioned with: [list pre-provisioned resources — VMs, storage, networking]
DNS: [failover DNS configuration — describe how DNS is switched]
VPN: [failover VPN endpoint — pre-configured or needs activation?]
Test scope:
Systems tested this quarter: CUI file server + DNS failover
Systems excluded from test: database (tested in Q3 full exercise)
Maintenance window:
Duration: [4 hours minimum — allow for test + rollback]
Schedule: Saturday morning 06:00–10:00 (minimise business impact)
Notification: IT Manager notifies CISO and department heads 1 week before
Users affected: all staff who access CUI files
User communication: "Planned IT maintenance Saturday 06:00–10:00 —
CUI file server unavailable during this period"
Failover test — step by step
PRE-TEST (T-1 hour — Friday evening before test):
[ ] Confirm last backup to cloud (offsite Copy 3) completed successfully
[ ] Confirm DR environment is powered on and reachable:
ping [DR-jumphost-IP] from IT Operations workstation
[ ] Confirm DR environment DNS records are pre-configured:
nslookup [fileserver.internal] [DR-DNS-IP] → should resolve to DR IP
[ ] Confirm test accounts have access to DR environment:
Test login to DR jump host via PAM
[ ] Send test start notification to CISO
TEST EXECUTION (06:00 Saturday):
T+00:00 — Initiate failover
Step 1: Update DNS to point fileserver.internal to DR IP
[DNS change commands or console steps — specify for deployed DNS]
Step 2: Verify DNS propagation:
On 3 different clients: nslookup fileserver.internal
Expected: DR IP address returned
Time for propagation to complete: [expected time — depends on TTL]
T+00:30 — Verify file access via DR environment
Log into DR environment via PAM
Confirm: CUI file share is accessible
Confirm: file listing matches last known production state
Confirm: a test file can be read successfully
Confirm: a test file can be written (if write access is required in DR)
Record: time from DNS change to confirmed file access = [actual RTO]
T+01:00 — Stress test (optional — if time permits)
Have 3 test users simultaneously access files via DR environment
Confirm performance is acceptable (not necessarily production-equivalent)
T+01:30 — Initiate rollback to production
Step 1: Revert DNS to production IP
Step 2: Verify DNS propagation back to production:
nslookup fileserver.internal → production IP returned
Step 3: Verify production file server is accessible:
Test file read from production server
Step 4: Confirm no data changes in DR environment need to be
synced back to production (DR environment was read-only
during test — if write was tested, sync back changes)
T+02:00 — Test complete
Verify: all users pointing to production server
Verify: DR environment can be powered down / returned to standby
Notify: CISO test is complete and outcome
POST-TEST (within 48 hours):
Document test record:
Test date and time
Systems tested
Actual RTO achieved vs target: [X minutes] vs [target minutes]
Verification steps and results (pass/fail for each)
Issues encountered
Rollback success: Yes / No
Recommendations: what could improve RTO? what was harder than expected?
Action items → EV-A04
Test record filed under BCM exercise records
Q3 — Full DR exercise procedure
Exercise design
The full DR exercise involves actual staff working on DR infrastructure
for a defined period (target: 2 hours of productive work on DR systems).
This is more than an IT test — it validates that the business can actually
function, not just that IT systems are reachable.
Participants:
IT Operations: full team — manages the failover
Business participants: 2–3 representatives from each department
(select staff who regularly use CUI-scope systems)
Observers: CISO, IT Manager, HR Manager
Scope of systems in DR during exercise:
CUI file server: in DR — staff access files from DR environment
Email: not failed over (already cloud — continuous)
Intranet / Confluence: [failed over or continuous — specify]
CRM / business application: [specify if in DR scope]
VPN: DR VPN endpoint active — staff connect via DR VPN
Exercise scenario:
"Primary data centre is unavailable due to a power infrastructure failure.
Recovery time for primary site is estimated at 6 hours.
We are now operating on our DR environment.
Your task for the next 2 hours is to continue your normal work
activities using the DR environment."
This is intentionally mundane. The goal is to discover what breaks
during normal work on DR systems, not to test crisis response.
Exercise execution
T-2 weeks: notify participants; provide pre-reading on DR access procedures
T-1 week: confirm DR environment is in expected state
T-1 day: confirm all participants have tested DR access credentials
(if using DR-specific credentials rather than production SSO)
T+00:00 — CISO declares exercise start
IT Operations initiates failover:
DNS failover to DR (same procedure as Q2 test)
VPN failover to DR endpoint
Any application-level failover steps
IT Operations monitor for failover completion:
Confirm all target systems accessible in DR
Record: time from declaration to DR ready = [actual RTO]
T+00:30 — Business participants begin work
Staff connect via DR VPN
Staff access CUI file server via DR environment
Staff conduct normal work activities
IT Operations monitors:
Support tickets / instant messages from participants (expect some!)
SIEM monitoring (DR environment should still forward logs)
DR system performance
T+00:30 to T+02:30 — Active exercise
CISO and IT Manager observe and document:
Which functions work without issues?
Which functions are degraded?
Which functions are completely unavailable?
What workarounds are staff using (and are those workarounds acceptable)?
What questions are staff asking that indicate the DR procedures are unclear?
T+02:30 — CISO declares exercise end; initiate failback to production
IT Operations conducts planned failback:
Sync any data created in DR environment back to production
Revert DNS to production
Verify production systems are accessible
Confirm DR environment returned to standby
Record: time from declaration to production restored
T+03:00 — Hot debrief (30 minutes with all participants)
Questions:
What did you try to do that did not work?
What was your biggest frustration with DR working?
What would have helped you work more effectively?
Were the DR access instructions clear?
Exercise report (within 10 days):
Objective vs achievement: [target RTO: X] — [actual: X]
Functions fully available in DR: [list]
Functions degraded in DR: [list with specific issues]
Functions unavailable in DR: [list — each needs a plan or a risk acceptance]
Staff feedback summary: [key themes]
Findings and action items: [specific improvements with owners and dates]
DR documentation gaps identified: [list]
Action items → EV-A04
DR runbook updated where gaps identified
Report filed and presented at next management review
Q4 — Combined cyber incident and BCM tabletop
This exercise satisfies both AT-IR 3.6.3 (test IR capability) and the
BCM programme. See AT-IR EV-D15 for the exercise record template.
Additional BCM-specific injects for the Q4 exercise:
Alongside the cyber incident scenario (ransomware / significant breach),
include injects that force BCM decisions:
T+6 hours: "The ransomware has encrypted the primary file server.
IT has confirmed that backup is intact but restoration
will take 12–16 hours. Should we fail over to DR
in the interim?"
T+8 hours: "Two engineers are ill — we have reduced IT capacity
at exactly the moment we need maximum capacity.
Who can we call in? What is our minimum viable IT team?"
T+12 hours: "The contracting authority (MOD / US DoD) is asking
whether our ability to fulfil contract obligations
is affected. What do we tell them?"
T+24 hours: "The incident has been contained but systems are not
yet restored. Tomorrow is the payroll processing date.
The payroll system is in the affected environment.
What is our contingency?"
These injects force the IRT to think about business continuity
simultaneously with incident response — which is the reality of a
major cyber incident.
BCM-specific findings from Q4 exercise are documented in the exercise
report alongside IR-specific findings. BCM action items → EV-A04.
BCP document updated based on findings.
Post-exercise follow-up — closing the loop
Every BCM/DR exercise must produce improvement, not just documentation.
The following loop ensures exercises drive actual change:
Within 24 hours of exercise:
Hot debrief notes captured (can be rough notes)
Critical findings escalated immediately (don't wait for the formal report)
Within 10 business days of exercise:
Formal exercise report produced by CISO
Action items entered in EV-A04 with named owners and due dates
Report presented at next available management meeting
Within 30 days of exercise:
BCP/DR documentation updated for any gaps identified
Training updates for any procedure that was poorly followed
Technical improvements scheduled (as Normal changes via RFC)
Before next exercise of the same type (12 months):
All action items from the previous exercise should be closed or
have a documented, CISO-approved extended timeline
If the same gap appears in two consecutive exercises, it is
escalated to the CISO as a persistent compliance risk
Evidence filing:
Exercise reports: EV-D · BCM → Exercises → [YYYY] → [Q1/Q2/Q3/Q4]
Action items: EV-A · Management System → Corrective Actions (EV-A04)
DR test technical records: EV-D · BCM → DR Tests → [YYYY-Q#]
Q4 exercise record: also filed as EV-D15 (AT-IR evidence)
Evidence summary — all IT Operations procedures
| Evidence ID | Procedure | What it is | Frequency | Owner | Location |
|---|---|---|---|---|---|
| EV-D27 | OP-01 Backup | Daily backup job completion log | Daily | IT Operations | EV-D → BCM → Backup Logs |
| EV-D28 | OP-01 Backup | Quarterly restoration test record | Quarterly | IT Manager | EV-D → BCM → Restoration Tests → [YYYY-QQ] |
| EV-D30 | OP-02 Certs | Certificate and key inventory | Monthly review | IT Operations | EV-D → Cryptography → Certificate Inventory |
| EV-F01 | OP-03 SIEM | Monthly SIEM log review | Monthly | Security Analyst | EV-F → Continuous Monitoring → Log Reviews |
| EV-F06 | OP-03 SIEM | Monthly SIEM health report | Monthly | IT Manager | EV-F → Continuous Monitoring → SIEM Health |
| EV-D21 | OP-04 Change | Change management records (RFCs) | Per change | IT Manager | EV-D → Config Management → Change Log |
| BCM exercise records | OP-05 BCM | Quarterly BCM/DR exercise reports | Quarterly | CISO | EV-D → BCM → Exercises → [YYYY] |
| EV-D15 | OP-05 BCM | Q4 combined exercise (also AT-IR evidence) | Annual | CISO | EV-D → Incident Response → Exercise Records |
IT Operations Procedures Library — Owner: IT Manager. Evidence cross-references: AT-MP (backup encryption), AT-SC-ENC (certificate requirements), AT-AU (SIEM), AT-CM (change management), AT-IR (BCM/DR testing). Questions: IT Manager or CISO.