|
| 1 | +# Cleanup Orphaned and Stale Workflow Contexts |
| 2 | + |
| 3 | +**Purpose:** Automated maintenance script to identify and cancel stale or orphaned workflow contexts in ServiceNow |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This fix script addresses a common ServiceNow instance health issue where workflow contexts remain stuck in "Executing" state indefinitely due to orphaned parent records or incomplete workflow execution. Over time, these stale contexts accumulate and can impact system performance, reporting accuracy, and workflow metrics. The script systematically identifies problematic workflow contexts based on configurable criteria and cancels them to maintain instance hygiene. |
| 8 | + |
| 9 | +## Detailed Description |
| 10 | + |
| 11 | +### What Problem Does This Solve? |
| 12 | + |
| 13 | +In ServiceNow environments, workflow contexts (`wf_context` table) occasionally become "stuck" in an executing state when: |
| 14 | + |
| 15 | +1. **Parent Record Deletion:** The record that initiated the workflow gets deleted before workflow completion |
| 16 | +2. **Workflow Design Issues:** Improperly designed workflows without proper completion logic |
| 17 | +3. **System Errors:** Database issues, timeouts, or system failures interrupt workflow execution |
| 18 | +4. **Data Integrity Problems:** Missing or corrupted table references in the context record |
| 19 | + |
| 20 | +These orphaned workflows continue to show as "executing" indefinitely, creating false metrics and consuming system resources during workflow engine operations. |
| 21 | + |
| 22 | +### How It Works |
| 23 | + |
| 24 | +The script operates through a systematic validation and cleanup process: |
| 25 | + |
| 26 | +#### **Phase 1: Configuration & Initialization** |
| 27 | +- Establishes a time threshold (default: 180 days) to identify long-running workflows |
| 28 | +- Calculates the cutoff date by subtracting the threshold from the current date |
| 29 | +- Sets up batch processing limits to prevent transaction timeouts |
| 30 | +- Initializes counters for tracking processed, cancelled, and orphaned workflows |
| 31 | + |
| 32 | +#### **Phase 2: Identification** |
| 33 | +The script queries the `wf_context` table for workflows matching these criteria: |
| 34 | +- **State:** Currently in "executing" status |
| 35 | +- **Age:** Created more than 180 days ago (configurable) |
| 36 | +- **Batch Limit:** Processes up to 500 records per execution (configurable) |
| 37 | + |
| 38 | +#### **Phase 3: Validation** |
| 39 | +For each identified workflow context, the script performs validation checks: |
| 40 | + |
| 41 | +1. **Reference Validation:** Verifies the workflow has valid table and record ID references |
| 42 | + - If either `table` or `id` field is empty → Mark for cancellation |
| 43 | + - Reason: "Missing table or record reference" |
| 44 | + |
| 45 | +2. **Parent Record Validation:** Checks if the parent record still exists |
| 46 | + - Queries the parent table using the stored record ID |
| 47 | + - If record cannot be retrieved → Mark for cancellation |
| 48 | + - Reason: "Parent record no longer exists" |
| 49 | + |
| 50 | +#### **Phase 4: Cleanup Execution** |
| 51 | +For workflows marked for cancellation: |
| 52 | +- **Dry Run Mode (Default):** Logs findings without making changes |
| 53 | +- **Execution Mode:** When `DRY_RUN = false`: |
| 54 | + - Sets workflow state to "cancelled" |
| 55 | + - Calls `setWorkflow(false)` to prevent triggering additional workflows during the update |
| 56 | + - Updates the workflow context record |
| 57 | + - Increments cancellation counter |
| 58 | + |
| 59 | +#### **Phase 5: Reporting** |
| 60 | +Generates comprehensive execution logs including: |
| 61 | +- Threshold date applied |
| 62 | +- Execution mode (dry run vs. actual) |
| 63 | +- Total workflows processed |
| 64 | +- Number of orphaned workflows identified |
| 65 | +- Number of workflows cancelled (if executed) |
| 66 | +- Individual workflow details with cancellation reasons |
| 67 | + |
| 68 | +## Configuration Variables |
| 69 | + |
| 70 | +### `DAYS_THRESHOLD` |
| 71 | +- **Type:** Integer |
| 72 | +- **Default:** 180 |
| 73 | +- **Purpose:** Defines how old a workflow must be (in days) to be evaluated for cleanup |
| 74 | +- **Recommendation:** Start with 180 days; adjust based on your organization's longest-running legitimate workflows |
| 75 | + |
| 76 | +### `BATCH_SIZE` |
| 77 | +- **Type:** Integer |
| 78 | +- **Default:** 500 |
| 79 | +- **Purpose:** Limits the number of records processed in a single execution to prevent database transaction timeouts |
| 80 | +- **Recommendation:** Keep at 500 for most instances; reduce to 250-300 if you experience timeout errors |
| 81 | + |
| 82 | +### `DRY_RUN` |
| 83 | +- **Type:** Boolean |
| 84 | +- **Default:** `true` |
| 85 | +- **Purpose:** Safety mechanism that logs findings without making actual changes |
| 86 | +- **Critical:** Always run in dry run mode first to review what would be affected |
| 87 | + |
| 88 | +## Prerequisites & Dependencies |
| 89 | + |
| 90 | +### Required Access |
| 91 | +- **Admin Role:** Required to execute background scripts and modify workflow contexts |
| 92 | +- **Table Access:** Read/write access to `wf_context` table |
| 93 | + |
| 94 | +### Scope Requirements |
| 95 | +- Should be executed in **Global scope** |
| 96 | +- Can be run as a Fix Script or Background Script |
| 97 | + |
| 98 | +### Testing Requirements |
| 99 | +- **Mandatory:** Test in non-production environment first |
| 100 | +- Verify workflows being cancelled are truly orphaned |
| 101 | +- Review dry run logs before actual execution |
| 102 | + |
| 103 | +## Execution Instructions |
| 104 | + |
| 105 | +### Step 1: Prepare the Script |
| 106 | +1. Copy the script to a Background Script or Fix Script module |
| 107 | +2. Review and adjust configuration variables based on your requirements |
| 108 | +3. Ensure `DRY_RUN = true` for initial execution |
| 109 | + |
| 110 | +### Step 2: Dry Run Execution |
| 111 | +1. Execute the script in dry run mode |
| 112 | +2. Review the System Logs for identified workflows |
| 113 | +3. Validate that workflows marked for cancellation are legitimate candidates |
| 114 | +4. Note the "Total Processed" and "Orphaned Workflows Found" counts |
| 115 | + |
| 116 | +### Step 3: Actual Execution |
| 117 | +1. Set `DRY_RUN = false` in the script |
| 118 | +2. Re-execute the script |
| 119 | +3. Monitor execution logs |
| 120 | +4. Verify workflows are successfully cancelled in the `wf_context` table |
| 121 | + |
| 122 | +### Step 4: Schedule (Optional) |
| 123 | +For ongoing maintenance, consider: |
| 124 | +- Creating a scheduled job to run this monthly or quarterly |
| 125 | +- Adjusting `DAYS_THRESHOLD` to a lower value for regular maintenance (e.g., 90 days) |
| 126 | +- Implementing notifications for cleanup execution results |
| 127 | + |
| 128 | +## Use Cases |
| 129 | + |
| 130 | +### Common Scenarios for This Script |
| 131 | + |
| 132 | +1. **Instance Health Maintenance:** Regular cleanup as part of quarterly instance maintenance activities |
| 133 | +2. **Pre-Upgrade Cleanup:** Clearing stale data before major version upgrades |
| 134 | +3. **Performance Optimization:** Reducing wf_context table bloat when workflow reports show high executing counts |
| 135 | +4. **Data Migration Cleanup:** After bulk record deletions or data migrations that leave orphaned workflows |
| 136 | +5. **Workflow Redesign Projects:** Cleaning up contexts from deprecated or redesigned workflows |
| 137 | + |
| 138 | +## Best Practices |
| 139 | + |
| 140 | +### Safety Measures |
| 141 | +- Always start with dry run mode enabled |
| 142 | +- Test in sub-production environments first |
| 143 | +- Document workflows being cancelled before execution |
| 144 | +- Schedule during low-usage maintenance windows |
| 145 | + |
| 146 | +### Monitoring |
| 147 | +- Review System Logs after each execution |
| 148 | +- Compare before/after counts in the wf_context table |
| 149 | +- Verify no legitimate long-running workflows are impacted |
| 150 | +- Monitor workflow execution metrics post-cleanup |
| 151 | + |
| 152 | +### Maintenance Schedule |
| 153 | +- Run quarterly for preventive maintenance |
| 154 | +- Run immediately if you notice unusually high executing workflow counts |
| 155 | +- Adjust `DAYS_THRESHOLD` based on your environment's workflow patterns |
| 156 | + |
| 157 | +## Technical Considerations |
| 158 | + |
| 159 | +### Performance Impact |
| 160 | +- **Batch Processing:** Limits database load through `BATCH_SIZE` control |
| 161 | +- **Query Efficiency:** Uses indexed fields (state, sys_created_on) for optimal performance |
| 162 | +- **Transaction Management:** `setWorkflow(false)` prevents cascade operations |
| 163 | + |
| 164 | +### Data Integrity |
| 165 | +- **No Data Loss:** Only cancels workflows; doesn't delete parent records |
| 166 | +- **Audit Trail:** All cancellations are logged in System Logs |
| 167 | +- **Reversibility:** Cancelled workflows remain in the table for audit purposes |
| 168 | + |
| 169 | +### Limitations |
| 170 | +- Processes only one batch per execution; may need multiple runs for large datasets |
| 171 | +- Focuses on orphaned workflows; doesn't detect all types of stuck workflows |
| 172 | +- Requires manual verification for workflows without obvious orphaning issues |
| 173 | + |
| 174 | +## Support & Enhancement Ideas |
| 175 | + |
| 176 | +### Potential Enhancements |
| 177 | +1. Add email notification summarizing cleanup results |
| 178 | +2. Implement additional validation for workflows stuck in activities |
| 179 | +3. Create reporting dashboard for workflow health metrics |
| 180 | +4. Add support for archiving cancelled contexts to secondary table |
| 181 | +5. Include validation for workflows without any executing activities |
| 182 | + |
| 183 | +*** |
| 184 | + |
| 185 | +This script is a practical maintenance tool that helps keep your ServiceNow instance healthy by addressing a common technical debt issue with workflow contexts, improving system performance and data accuracy. |
| 186 | + |
| 187 | +## Authors |
| 188 | +Masthan Sharif Shaik ( <a href="https://www.linkedin.com/in/nowsharif/" target="_blank">LinkedIn</a> , <a href="https://www.servicenow.com/community/user/viewprofilepage/user-id/668622" target="_blank">SN Community</a> ) |
| 189 | + |
| 190 | +## Version History: |
| 191 | +* 0.1 |
| 192 | + * Initial Release |
0 commit comments