Skip to content

Commit c853f81

Browse files
Revise AI RCA documentation to streamline steps for enabling features, configuring analysis scope, and managing custom RCA categories. Added best practices and examples for clarity. atx-6210
1 parent ca16c94 commit c853f81

File tree

1 file changed

+96
-54
lines changed

1 file changed

+96
-54
lines changed

docs/analytics-ai-root-cause-analysis.md

Lines changed: 96 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -72,61 +72,14 @@ AI RCA is an intelligent feature that uses advanced machine learning algorithms
7272

7373
### Step 2: Enable AI RCA
7474

75-
1. **Toggle the Feature**: Use the blue toggle switch to enable "Automatic AI RCA"
76-
2. **Configure Analysis Scope**: Choose which types of test failures to analyze:
77-
- **All failures**: Analyze every failed test, regardless of previous status
78-
- **New failures**: Analyze only tests that have failed recently after having passed at least 10 consecutive times previously.
79-
- **Consistent Failures**: Analyze only tests that have failed in all of their previous 5 runs to identify persistent issues.
75+
**Toggle the Feature**: Use the blue toggle switch to enable "Automatic AI RCA"
8076

81-
### Step 3: Set Special Instructions (Optional)
77+
### Step 3: Configure Analysis Scope
8278

83-
Provide context or specific guidance for the AI to consider during analysis:
84-
85-
1. Click on the **Special Instructions** section
86-
2. Enter any special instructions or context that should be considered during AI root cause analysis
87-
3. Use the "Show examples" link for guidance on effective instruction writing
88-
89-
**Example Instructions:**
90-
91-
:::tip
92-
Our CRM application has specific failure patterns to watch for:
93-
94-
**PRIORITY CATEGORIES**
95-
1. **Database Connection Issues** - Our PostgreSQL connection pool is limited to 20 connections. Look for connection timeouts, pool exhaustion, or slow query performance.
96-
97-
2. **Third-party API Failures** - We integrate with Salesforce, HubSpot, and Mailchimp. These external APIs often have rate limits and intermittent failures that cause our tests to fail.
98-
99-
3. **File Upload/Processing Issues** - Contact import via CSV files often fails due to file size limits (10MB max) or malformed data. Check for upload timeouts and validation errors.
100-
101-
4. **Authentication/Authorization** - We use OAuth 2.0 with multiple providers. Token expiration and permission changes frequently cause test failures.
102-
103-
5. **UI Element Timing Issues** - Our CRM uses dynamic loading for contact lists and reports. Elements may not be ready when tests try to interact with them.
104-
105-
**SPECIFIC CONTEXT**
106-
- Our test environment has limited resources compared to production
107-
- We run tests during business hours when external APIs are under heavy load
108-
- Focus on identifying whether failures are environment-specific or application bugs
109-
- Prioritize failures that affect core CRM functionality (contact management, lead tracking, reporting)
110-
- Consider our custom error handling - we log all errors to Sentry and show user-friendly messages
111-
112-
**IGNORE THESE COMMON FALSE POSITIVES**
113-
- Browser console warnings that don't affect functionality
114-
- Network requests to analytics services (Google Analytics, Hotjar)
115-
- Minor UI layout shifts that don't break functionality
116-
:::
117-
118-
**Possible Categories and Descriptions:**
119-
120-
| Category | Description |
121-
|----------|-------------|
122-
| **Database Issues** | Connection timeouts, query performance, data integrity problems |
123-
| **API Integration** | Third-party service failures, rate limiting, authentication issues |
124-
| **UI/UX Problems** | Element not found, timing issues, responsive design failures |
125-
| **Performance Issues** | Slow page loads, memory leaks, resource exhaustion |
126-
| **Environment Issues** | Test data problems, configuration mismatches, infrastructure failures |
127-
| **Authentication/Authorization** | Login failures, permission errors, session timeouts |
128-
| **File Processing** | Upload failures, format validation, processing timeouts |
129-
| **Network Issues** | Connectivity problems, DNS failures, proxy issues |
79+
In the **Analysis Scope** section, choose which types of test failures to analyze:
80+
- **All failures**: Analyze every failed test, regardless of previous status
81+
- **New failures**: Analyze only tests that have failed recently after having passed at least 10 consecutive times previously.
82+
- **Consistent Failures**: Analyze only tests that have failed in all of their previous 5 runs to identify persistent issues.
13083

13184
### Step 4: Configure Intelligent Targeting
13285

@@ -181,8 +134,94 @@ The intelligent targeting system applies rules using the following logic:
181134
**Result**: AI-powered analysis will run only on production tests (excluding non-critical ones) from hourly builds, focusing on Playwright or HyperExecute test tags, while excluding smoke tests. The analysis will target ecommerce and payment projects, excluding staging projects. This configuration helps narrow down analysis to the most critical test scenarios.
182135
:::
183136

137+
### Step 5: Manage Custom RCA Categories (Optional)
138+
139+
Custom RCA Categories allow you to define intelligent classification categories that automatically categorize and organize test failure analysis results. This helps you group similar failures together, track trends, and prioritize fixes more effectively.
140+
141+
#### Managing Categories
142+
143+
1. In the **Automatic AI RCA** configuration page, locate the **Custom RCA Categories** section
144+
2. Click the **Manage** button to open the category management drawer
145+
3. **Create**: Click **Add Category**, enter a name and description, select **Active** or **Inactive** status, then click **Create RCA Category**
146+
4. **Edit**: Click the edit icon on any category card to modify its details
147+
5. **Delete**: Click the delete icon and confirm to remove a category
148+
6. **Search**: Use the search box to filter categories by name or description
149+
150+
**Category Status:**
151+
- **Active**: Used by AI for automatic classification and appears in RCA results
152+
- **Inactive**: Saved but not used for classification; can be reactivated later
153+
154+
**Best Practices:**
155+
156+
:::tip
157+
- **Be Specific**: Create distinct categories (e.g., "Database Connection Timeouts" vs "Database Issues")
158+
- **Use Clear Names**: Choose names your team understands immediately
159+
- **Start Small**: Begin with 5-10 active categories for your most common failure types
160+
- **Review Regularly**: Periodically refine categories based on your failure patterns
161+
:::
162+
163+
**Example Custom RCA Categories:**
164+
165+
| Category Name | Description |
166+
|--------------|-------------|
167+
| **UI Element Not Found** | Failures where tests cannot locate expected UI elements due to timing issues, selector changes, or DOM modifications |
168+
| **API Timeout Errors** | Failures caused by API requests exceeding timeout thresholds, often related to third-party service reliability |
169+
| **Database Connection Issues** | Failures due to database connection pool exhaustion, connection timeouts, or query performance problems |
170+
| **Authentication Token Expiration** | Failures related to expired or invalid authentication tokens, session timeouts, or OAuth refresh issues |
171+
| **Network Connectivity Issues** | Failures caused by network interruptions, DNS failures, proxy issues, or unstable network connections |
172+
173+
### Step 6: Set Special Instructions (Optional)
174+
175+
Provide context or specific guidance for the AI to consider during analysis:
176+
177+
1. Click on the **Special Instructions** section
178+
2. Enter any special instructions or context that should be considered during AI root cause analysis
179+
3. Use the "Show examples" link for guidance on effective instruction writing
180+
181+
**Example Instructions:**
182+
183+
:::tip
184+
**Environment-Specific Context:**
185+
- Running on Staging environment with test data
186+
- Database may have lag issues during peak hours (9 AM - 5 PM EST)
187+
- Test environment has limited resources compared to production (2GB RAM vs 8GB)
188+
- Network latency is higher in test environment (average 150ms vs 50ms in production)
189+
190+
**Known Issues & Patterns:**
191+
- Payment gateway timeouts during high traffic periods (especially between 2-4 PM)
192+
- Cache invalidation issues occur immediately after deployments
193+
- Third-party API rate limits: Salesforce (1000 requests/hour), HubSpot (500 requests/hour)
194+
- Database connection pool is limited to 20 connections - look for pool exhaustion patterns
195+
- OAuth token expiration happens every 24 hours - failures around token refresh time are expected
196+
197+
**Analysis Preferences:**
198+
- Focus on recent failures over recurring issues when prioritizing
199+
- Consider browser compatibility differences (Chrome vs Firefox behavior variations)
200+
- Check for timing-related failures (elements loading asynchronously)
201+
- Distinguish between environment-specific issues vs application bugs
202+
- Prioritize failures affecting core user journeys: Login, Checkout, Dashboard, Profile Management
203+
204+
**Business Context:**
205+
- Critical user journeys: Login, Checkout, Dashboard, Profile Management
206+
- Performance thresholds: Page load < 3s, API response < 500ms
207+
- Peak usage hours: 10 AM - 2 PM and 6 PM - 9 PM EST
208+
- High-value features: Payment processing, Order management, Customer support portal
209+
210+
**Technical Constraints:**
211+
- Flaky network connections in mobile tests (use retry logic)
212+
- Third-party service dependencies may be unstable (payment gateway, email service)
213+
- Custom error handling: All errors logged to Sentry, user-friendly messages displayed
214+
- Test data cleanup runs nightly - some data may be stale during day
215+
216+
**Ignore These Common False Positives:**
217+
- Browser console warnings that don't affect functionality
218+
- Network requests to analytics services (Google Analytics, Hotjar, Mixpanel)
219+
- Minor UI layout shifts that don't break functionality (< 5px)
220+
- Expected 404s for optional resources (favicon, tracking pixels)
221+
- Third-party script loading delays that don't impact core functionality
222+
:::
184223

185-
### Step 5: Save Configuration
224+
### Step 7: Save Configuration
186225

187226
1. Click **Save Configuration** to apply your settings
188227
2. The settings will be applied to all users in your organization and cannot be modified by individual users or need admin level privileges.
@@ -268,6 +307,7 @@ The RCA Category Trends widget in Insights enables you to:
268307
- **Start with "All failures"** to get comprehensive coverage, then refine based on your needs
269308
- **Use specific special instructions** to guide the AI toward your most critical issues
270309
- **Set up intelligent targeting** to focus on relevant test suites and exclude noise
310+
- **Create custom RCA categories** to organize and track failure patterns systematically
271311

272312
### 2. Interpreting Results
273313

@@ -281,6 +321,7 @@ The RCA Category Trends widget in Insights enables you to:
281321
- **Review RCA accuracy** and provide feedback when possible
282322
- **Monitor trend analysis** to identify recurring patterns
283323
- **Update special instructions** based on new insights and requirements
324+
- **Refine custom RCA categories** to better match your failure patterns and organizational needs
284325
- **Share RCA results** with your team to improve collective understanding
285326

286327
<!-- ### 4. Integration with Workflow
@@ -309,6 +350,7 @@ The RCA Category Trends widget in Insights enables you to:
309350
- **Refine special instructions**: Provide more specific context about your application
310351
- **Update intelligent targeting**: Exclude irrelevant tests that might confuse the analysis
311352
- **Review error categorization**: Ensure test failures are properly categorized
353+
- **Refine custom RCA categories**: Update category descriptions to better match your failure patterns
312354
- **Provide feedback**: Use any available feedback mechanisms to improve accuracy
313355

314356
</details>

0 commit comments

Comments
 (0)