ServiceNow Event Management: Proactive Detection and Incident Automation
Event Management turns ServiceNow from a reactive ticketing system into a proactive monitoring nerve center. Instead of waiting for users to report outages, it catches failures from infrastructure, cloud providers, and custom applications — and can create incidents automatically before anyone notices. Here's how to set it up properly.
What Event Management Actually Does
ServiceNow Event Management (EM) ingests alerts from monitoring tools, normalizes them, correlates them against the CMDB, and triggers automated actions — incident creation, email notifications, webhook calls, or chat alerts. The goal is to close the gap between "something breaks" and "someone knows about it."
The key components:
- Event sources — where alerts come from (MID Server + SNMP, cloud APIs, webhook endpoints, third-party tools like Nagios or Datadog)
- Event records — stored in the
em_eventtable, each tied to a CI from the CMDB - Event rules — match incoming events and define what happens next (create incident, suppress, notify)
- Correlation — groups related events to prevent a storm of duplicate incidents
Without Event Management, your monitoring tools generate alerts that sit in dashboards. With it, those alerts become structured, CI-linked, actionable records that drive your ITSM process automatically.
Event Sources: How Events Enter ServiceNow
MID Server + SNMP (On-Premises Infrastructure)
The most common on-premises event source. The MID Server polls network devices via SNMP traps and v2c/v3 queries. Configure the probe in ServiceNow under Event Management > Sources > MID Server, then enable SNMP traps on your network devices pointing to the MID Server's IP.
For SNMP v3 (recommended for production):
// MID Server SNMPv3 probe configuration
// In the MID Server's probe config XML or via the UI:
// Host: your.target.device.ip
// Protocol: SNMPv3
// Security Name: em_monitor
// Auth Protocol: SHA256
// Priv Protocol: AES256
// Context: sn-tricks-monitoring
The MID Server translates SNMP traps into em_event records automatically, mapping the trap OID to a ServiceNow event definition.
Cloud Monitoring (AWS, Azure, GCP)
Cloud events arrive via ServiceNow's cloud-native integrations. These don't require a MID Server — ServiceNow connects directly via OAuth or API keys.
AWS CloudWatch integration:
- Go to Event Management > Sources > Amazon CloudWatch
- Authenticate with an AWS IAM role that has CloudWatch read permissions
- Select the AWS account link and the regions to monitor
- Configure which CloudWatch alarm states trigger events (OK, ALARM, INSUFFICIENT_DATA)
Azure Monitor:
- Event Management > Sources > Microsoft Azure
- Register a ServiceNow app in Azure AD with the Monitoring Reader role
- Link the Azure subscription to your ServiceNow instance
Cloud events are normalized into the same em_event schema as SNMP events, so event rules handle them identically.
Webhook Ingestion (Custom Applications)
For custom applications and in-house tools, push events directly via the Event API:
// Create a custom event from application code
// POST to /api/em/event
var eventPayload = {
source: 'custom-app-api',
event_class: 'Application',
node: 'api-gateway-01',
metric_name: 'error_rate',
metric_value: 15.4,
severity: 'WARNING',
description: 'API Gateway error rate exceeded 10% threshold',
time: new Date().toISOString()
};
// Or via the Scripted REST API
var em = new EventMgmt();
em.createEvent('cmdb_ci_server', serverSysId, 'custom:error_rate_high',
'API Gateway error rate at15.4%', 2);
ServiceNow's Scripted REST API at /api/em/event accepts JSON payloads and maps them to em_event records. This is the cleanest way to integrate custom monitoring.
Third-Party Monitoring Tools (Nagios, Zabbix, Datadog, PagerDuty)
ServiceNow ships out-of-the-box integrations with major monitoring platforms. Most work by installing a MID Server-side plugin or by configuring an alert channel in the third-party tool pointing to ServiceNow's webhook endpoint.
For Nagios (via NRDP or check_mk):
- Configure Nagios to POST check results to the MID Server's probe endpoint
- The MID Server normalizes these into
em_eventrecords
For Datadog:
- Use the ServiceNow Datadog Integration tile from the ServiceNow Store
- Configure the Datadog webhook notification to point to your instance
Creating Event Definitions
An event definition tells ServiceNow how to interpret a raw event. Without a matching definition, incoming events are stored but won't trigger rules.
Navigate to Event Management > Configuration > Event Definitions and create one:
Name: CPU Utilization High
Source: mid_server
Event Name: cpu_utilization_high
Description: CPU utilization exceeded threshold on ${node}
Severity Mapping:
-80-90%: WARNING
- 91-100%: CRITICAL
CI Type: cmdb_ci_server
Event definitions also define the cooldown period — how long to wait before re-alerting on the same CI. Set this to avoid alert fatigue from flapping thresholds. A15-minute cooldown on a CPU alert is reasonable; a 4-hour cooldown on a disk-space alert might be too aggressive.
Event Rules: From Event to Incident
Event rules are the logic layer. They match incoming events and define the response. Each rule has:
- Conditions — which events it applies to (source, severity, CI type, node pattern)
- Filter — additional script-based conditions (e.g., only fire if the CI is in production)
- Actions — what happens (create incident, send notification, suppress, webhook)
Creating an Incident from an Event
Rule configuration:
Name: Auto-Create Incident from Critical Events
Order: 100
Conditions:
- Event Name: matches pattern *critical* OR *down* OR *unreachable*
- Severity: CRITICAL
- Source: INCLUDES 'mid_server'
Filter (script):
// Only fire for production CIs
var ci = new GlideRecord('cmdb_ci');
ci.get(current.node);
if (ci.u_environment == 'production') {
return true;
}
return false;
Actions:
- Create Incident:
- Type: Auto-created from EM
- Category: Network / Server / Application (based on CI type)
- Assignment Group: mapped from CI's support group
- Urgency: HIGH
- Priority: 1 (for CRITICAL) / 2 (for WARNING)
- Short Description: ${event_name} on ${node}
- Description: ${description}
- CMDB CI: ${node}
- Notify: Assignment Group (email)
The incident is linked to the triggering event via the em_event reference on the incident record. This linkage is what powers correlation — ServiceNow can group all events from the same CI into a single incident.
Scripted Events: Custom Event Generation
For custom application monitoring, generate events directly from server-side scripts. This is more flexible than webhook ingestion and gives you full control over event properties.
// In a Script Include or Background Script
var eventMgmt = new EventMgmt();
// Create a custom event tied to a specific CI
var serverSysId = 'abc123def456'; // sys_id of the cmdb_ci_server record
var eventName = 'custom:order_processing_slow';
var description = 'Order processing time exceeded 5s SLA on web-server-03';
var priority = 2; // 1=critical, 2=high, 3=medium, 4=low
eventMgmt.createEvent('cmdb_ci_server', serverSysId, eventName, description, priority);
// If you need to pass additional metadata:
var eventGR = new GlideRecord('em_event');
eventGR.initialize();
eventGR.event_name = eventName;
eventGR.source = 'custom-order-service';
eventGR.node = 'web-server-03';
eventGR.description = description;
eventGR.priority = priority;
eventGR.u_order_id = 'ORD-20240608-001'; // custom field
eventGR.u_processing_time_ms = 5200;
eventGR.insert();
Custom fields on em_event are fully supported — add u_ prefixed columns to capture additional context (order IDs, transaction IDs, environment tags) that enriches the incident and helps with debugging.
Event Correlation: Preventing Alert Storms
When a single server goes down, your monitoring stack might fire20 events simultaneously — CPU down, memory critical, disk unreachable, ping timeout, service stopped. Without correlation, you get 20 separate incidents. With correlation, you get one.
ServiceNow correlates events on three levels:
- CI-based correlation — All events from the same CI within the correlation window are grouped. Set the window under Event Management > Configuration > Settings > Correlation Window (default: 15 minutes).
- Pattern-based correlation — Events matching the same pattern (e.g., "db-server-*") are grouped even if from different nodes.
- Topology-aware correlation — Events are correlated based on CMDB relationships. If a load balancer goes down, events from all backend servers behind it are grouped under a single incident.
Configuring Correlation Rules
Name: Database Tier Correlation
Priority: 50
Match Criteria:
- CI Relationship: depends on cmdb_ci_database
- Event Name Pattern: *oracle* OR *mysql* OR *database*
Window: 30 minutes
Action: Group into single incident
Monitoring and Debugging Event Pipelines
When events aren't behaving as expected, work backwards through the pipeline:
- Check the event arrived — Query
em_eventfiltered by source, node, and event_name. If the event isn't here, the issue is upstream (MID Server connectivity, webhook delivery, cloud API authentication). - Check the event rule matched — Each event record shows
event_ruleandrule_matchedfields. If these are empty, the event didn't match any rule. - Check the incident was created — The
em_eventrecord has anincidentreference. If an incident should exist but doesn't, check the rule's action configuration. - Check MID Server logs — For SNMP/probe-based events,
$MID_HOME/logs/mid.logshows raw event ingestion and any translation errors.
// Debug script: find unprocessed events
(function debugEvents() {
var events = new GlideRecord('em_event');
events.addQuery('state', '!=", "processed');
events.addQuery('sys_created_on', '>', gs.daysAgo(1));
events.orderByDesc('sys_created_on');
events.query();
gs.print('Unprocessed events in last 24h: ' + events.getRowCount());
while (events.next()) {
gs.print('---');
gs.print('Event: ' + events.event_name.getDisplayValue());
gs.print('Source: ' + events.source.getDisplayValue());
gs.print('Node: ' + events.node.getDisplayValue());
gs.print('State: ' + events.state.getDisplayValue());
gs.print('Rule Matched: ' + events.event_rule.getDisplayValue());
gs.print('Incident: ' + (events.incident ? events.incident.getDisplayValue() : 'none'));
}
})();
Best Practices
- Use CI linkage on every event — Events without a CI reference can't be correlated. Always pass the
node(CI sys_id) when creating scripted events. - Set meaningful cooldowns — Flapping alerts destroy trust in the system. Tune cooldowns per event type, not globally.
- Limit incident creation per CI per window — Even with correlation, configure a maximum incidents-per-CI-per-day setting to prevent cascade failures from creating hundreds of tickets.
- Separate alert sources by environment — Your dev/staging environment shouldn't create production incidents. Use event rules that filter by CI environment tag.
- Use event severity mapping — Don't map everything to CRITICAL. Preserve the severity ladder so on-call routing can be based on actual priority.
- Test rules with suppression first — Create rules in "suppress only" mode initially, verify events are matching correctly, then switch to "create incident" action.
- Correlate at the dependency level — Use CMDB relationships for topology-aware correlation. Incidents grouped by infrastructure dependency are more actionable than those grouped by CI name pattern.
Event Management done right means your operators wake up to incidents with full context — what CI failed, what else is affected, and what triggered it. That's a fundamentally different experience than triaging a flood of raw alerts from a monitoring dashboard.
