Skip to content

Commit feb91d5

Browse files
authored
Merge pull request #2710 from port-labs/PORT-16061
Add Documentation for Azure incremental Sync Vs Non Incremental Sync
2 parents 536d385 + cd4937b commit feb91d5

File tree

2 files changed

+301
-0
lines changed

2 files changed

+301
-0
lines changed

docs/build-your-software-catalog/sync-data-to-catalog/cloud-providers/azure/azure.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,31 @@ The Azure exporter can retrieve all the resources supported by the [Azure Resour
3838

3939
For examples on how to map resources head to the [resource templates](/build-your-software-catalog/sync-data-to-catalog/cloud-providers/azure/resource_templates/resource_templates.md) page.
4040

41+
## Sync approaches
42+
43+
Port offers multiple approaches for synchronizing Azure resources, each suited for different use cases:
44+
45+
### Azure exporter (ocean-based)
46+
- **Full resource scanning** for complete state synchronization via **Azure Resource Manager (ARM) REST API**.
47+
- **Change notifications** via Azure Event Grid (**available only in the Terraform deployment**).
48+
- **Managed deployment** via Helm, Docker, or ContainerApp.
49+
- **Best for**: Production environments requiring comprehensive resource visibility, full resource schema, and real-time Event Grid updates (with Terraform).
50+
51+
### Azure incremental sync (standalone)
52+
- **Lightweight change detection** via Azure Resource Graph.
53+
- **Efficient polling** with configurable time windows.
54+
- **GitHub Actions deployment** for automated workflows.
55+
- **Best for**: Production environments requiring comprehensive resource visibility, full resource schema, and real-time Event Grid updates.
56+
57+
:::tip Choosing the right approach
58+
Use the Azure exporter when you need comprehensive resource scanning and can set up Event Grid for change notifications. Use the incremental sync integration when you want lightweight, efficient synchronization with minimal resource, don't have Event Grid infrastructure or partial schema coverage is acceptable.
59+
:::
60+
4161
## Next Steps
4262

4363
- Refer to the [Resource Templates](/build-your-software-catalog/sync-data-to-catalog/cloud-providers/azure/resource_templates/resource_templates.md) page for templates on how to map Azure resources to Port.
4464
- Check out the [Azure Multi Subscriptions](/build-your-software-catalog/sync-data-to-catalog/cloud-providers/azure/multi-subscriptions.md) guide for setting up synchronization of Azure resources.
65+
- Learn about [Azure Incremental Sync](/build-your-software-catalog/sync-data-to-catalog/cloud-providers/azure/incremental-sync.md) for lightweight, efficient change-based synchronization.
4566

4667
## Configuration
4768

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
# Azure incremental sync integration
6+
7+
:::info Standalone Integration
8+
This is a **separate, standalone integration** that runs independently from the Port Azure exporter. It's designed for lightweight, efficient synchronization of Azure resources using Azure Resource Graph change detection.
9+
:::
10+
11+
## Overview
12+
13+
The Azure incremental sync integration provides a lightweight, efficient way to synchronize Azure resources to Port by detecting and ingesting only recent changes. Unlike the Azure exporter that requires full rescans or Event Grid setup, this integration uses Azure Resource Graph's change history tables to identify modifications within a configurable time window.
14+
15+
## How it works
16+
17+
### Change detection via Azure resource graph
18+
19+
The integration queries Azure Resource Graph's change history tables:
20+
21+
- **`resourcechanges`** - For individual Azure resources (VMs, storage accounts, etc.).
22+
- **`resourcecontainerchanges`** - For resource containers (subscriptions, resource groups).
23+
24+
### Query strategy
25+
26+
1. **Incremental Mode**: Queries changes within a configurable time window (default: 15 minutes).
27+
2. **Full sync**: You can run a manual full sync workflow once to get all existing Azure resources into Port before relying on incremental polling.
28+
3. **Smart Joins**: Combines change data with current resource metadata for complete information.
29+
30+
31+
### Key benefits
32+
33+
| Approach | Advantages | Considerations |
34+
|----------|------------|----------------|
35+
| **Azure Exporter (Ocean-based)** - **Full Sync + Event Grid**. | **Complete schema** from ARM APIs, initial full sync plus **push-based** near real-time updates via Event Grid (**Terraform deployment only**). | Requires Terraform deployment for Event Grid, heavier API usage for full scans. |
36+
| **Azure Incremental Sync (Standalone)** - **Polling ARG**. | **Lightweight** and cost-efficient, detects and ingests only recent changes via Azure Resource Graph, simple to deploy (e.g., GitHub Actions). | **Partial schema** (limited to ARG fields), **polling** must run frequently to avoid missed changes. |
37+
38+
39+
## When to use
40+
41+
- **Use the Azure Exporter (Ocean-based)** when you need the **full ARM schema** plus **near real-time** updates through **Event Grid** (best for production and comprehensive visibility).
42+
- **Use the Azure Incremental Sync** when you need **lightweight change tracking** through **Azure Resource Graph polling**, can accept a **partial schema**, and want a simple scheduled workflow (e.g., GitHub Actions).
43+
44+
## Prerequisites
45+
46+
### Azure set up
47+
48+
1. **Azure App Registration** with the following permissions:
49+
- **Azure Service Management**: `user_impersonation`
50+
- **Azure Resource Graph**: `Read` permission
51+
52+
2. **Role Assignments**:
53+
- `Reader` role on subscriptions for listing and Resource Graph access.
54+
55+
3. **Required Values**:
56+
- `AZURE_CLIENT_ID`: Azure service principal client ID
57+
- `AZURE_CLIENT_SECRET`: Azure service principal client secret
58+
- `AZURE_TENANT_ID`: Azure tenant ID
59+
60+
### Port set up
61+
62+
1. **Blueprints**: Create the required blueprints in Port before syncing.
63+
2. **Webhook**: Set up a webhook data source for ingesting Azure resources.
64+
3. **Webhook Mapping**: Configure the webhook mapping for Azure resource types.
65+
66+
## Configuration
67+
68+
### Environment variables
69+
70+
```bash
71+
# Required
72+
AZURE_CLIENT_ID=your_client_id
73+
AZURE_CLIENT_SECRET=your_client_secret
74+
AZURE_TENANT_ID=your_tenant_id
75+
PORT_WEBHOOK_INGEST_URL=your_webhook_url
76+
77+
# Optional
78+
SYNC_MODE=incremental # incremental (default) or full
79+
CHANGE_WINDOW_MINUTES=15 # Time window for change detection
80+
SUBSCRIPTION_BATCH_SIZE=1000 # Subscriptions per batch (max 1000)
81+
RESOURCE_TYPES='["microsoft.keyvault/vaults","Microsoft.Network/virtualNetworks"]'
82+
RESOURCE_GROUP_TAG_FILTERS='{"include": {"Environment": "Production"}, "exclude": {"Temporary": "true"}}'
83+
```
84+
85+
### Resource group tag filtering
86+
87+
The integration supports powerful filtering based on resource group tags:
88+
89+
```json
90+
{
91+
"include": {"Environment": "Production", "Team": "Platform"},
92+
"exclude": {"Temporary": "true", "Stage": "deprecated"}
93+
}
94+
```
95+
96+
**Filter logic:**
97+
- **Include filters**: ALL conditions must match (AND logic).
98+
- **Exclude filters**: ANY condition will exclude (OR logic).
99+
- **Combined**: Resources must match include criteria AND NOT match exclude criteria.
100+
101+
## Deployment options
102+
103+
### GitHub actions (recommended)
104+
105+
The integration is primarily designed to run via GitHub Actions workflows:
106+
107+
```yaml showLineNumbers
108+
name: "Azure Incremental Sync"
109+
on:
110+
schedule:
111+
- cron: "*/15 * * * *" # Every 15 minutes
112+
113+
jobs:
114+
sync:
115+
runs-on: ubuntu-latest
116+
steps:
117+
- name: Checkout
118+
uses: actions/checkout@v2
119+
with:
120+
repository: port-labs/incremental-sync
121+
122+
- name: Setup Python
123+
uses: actions/setup-python@v5
124+
with:
125+
python-version: "3.12"
126+
127+
- name: Install dependencies
128+
run: |
129+
cd integrations/azure_incremental
130+
pip install poetry
131+
make install
132+
133+
- name: Run sync
134+
run: |
135+
cd integrations/azure_incremental
136+
make run
137+
env:
138+
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
139+
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
140+
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
141+
PORT_WEBHOOK_INGEST_URL: ${{ secrets.PORT_WEBHOOK_INGEST_URL }}
142+
CHANGE_WINDOW_MINUTES: 15
143+
```
144+
145+
### Local execution
146+
147+
For development and testing:
148+
149+
```bash
150+
# Clone the repository
151+
git clone https://github.com/port-labs/incremental-sync.git
152+
cd incremental-sync/integrations/azure_incremental
153+
154+
# Install dependencies
155+
make install
156+
157+
# Set environment variables
158+
export AZURE_CLIENT_ID=your_client_id
159+
export AZURE_CLIENT_SECRET=your_client_secret
160+
export AZURE_TENANT_ID=your_tenant_id
161+
export PORT_WEBHOOK_INGEST_URL=your_webhook_url
162+
163+
# Run the integration
164+
make run
165+
```
166+
167+
## Azure resource graph queries
168+
169+
### Incremental resource query
170+
171+
The integration uses sophisticated KQL queries to detect changes:
172+
173+
```kusto showLineNumbers
174+
resourcechanges
175+
| extend changeTime=todatetime(properties.changeAttributes.timestamp)
176+
| extend targetResourceId=tostring(properties.targetResourceId)
177+
| extend changeType=tostring(properties.changeType)
178+
| where changeTime > ago(15m)
179+
| summarize arg_max(changeTime, *) by resourceId
180+
| join kind=leftouter (
181+
resources
182+
| extend sourceResourceId=tolower(id)
183+
| project sourceResourceId, name, location, tags, subscriptionId, resourceGroup
184+
) on $left.resourceId == $right.sourceResourceId
185+
| join kind=leftouter (
186+
resourcecontainers
187+
| where type =~ 'microsoft.resources/subscriptions/resourcegroups'
188+
| project rgName=tolower(name), rgTags=tags, rgSubscriptionId=subscriptionId
189+
) on $left.subscriptionId == $right.rgSubscriptionId and $left.resourceGroup == $right.rgName
190+
```
191+
192+
### Resource container query
193+
194+
For subscriptions and resource groups:
195+
196+
```kusto showLineNumbers
197+
resourcecontainerchanges
198+
| extend changeTime = todatetime(properties.changeAttributes.timestamp)
199+
| extend resourceType = tostring(properties.targetResourceType)
200+
| extend resourceId = tolower(properties.targetResourceId)
201+
| extend changeType = tostring(properties.changeType)
202+
| where changeTime > ago(15m)
203+
| summarize arg_max(changeTime, *) by resourceId
204+
| join kind=leftouter (
205+
resourcecontainers
206+
| extend sourceResourceId=tolower(id)
207+
| project sourceResourceId, type, name, location, tags, subscriptionId, resourceGroup
208+
) on $left.resourceId == $right.sourceResourceId
209+
```
210+
211+
## Performance considerations
212+
213+
### Rate limiting
214+
215+
The integration includes built-in rate limiting:
216+
- **Capacity**: 250 requests
217+
- **Refill Rate**: 25 requests per second
218+
- **Automatic backoff** when rate limits are exceeded
219+
220+
### Batch Processing
221+
222+
- **Subscription batching**: Processes subscriptions in configurable batches (default: 1000).
223+
- **Resource batching**: Sends resources to Port in batches of 100 for optimal performance.
224+
225+
### Change window optimization
226+
227+
- **Default window**: 15 minutes
228+
- **Polling frequency**: Should be shorter than the change window.
229+
- **Recommended**: Poll every 5-10 minutes for a 15-minute window.
230+
231+
## Troubleshooting
232+
233+
### Common issues
234+
235+
**No changes detected:**
236+
- Verify polling interval aligns with `CHANGE_WINDOW_MINUTES`.
237+
- Try increasing the time window (e.g., 30 minutes).
238+
- Check Azure Resource Graph permissions.
239+
240+
**Missing deletes:**
241+
- Ensure webhook mapping handles `changeType=Delete` correctly.
242+
- Verify Port webhook configuration for delete operations.
243+
244+
**Resource Graph delays:**
245+
- Allow 1-2 minute lag for Azure Resource Graph updates
246+
- Consider increasing `CHANGE_WINDOW_MINUTES` if needed
247+
248+
### Logging
249+
250+
The integration provides detailed logging:
251+
- **Resource discovery**: Subscription and resource counts
252+
- **Query execution**: Azure Resource Graph query results
253+
- **Port operations**: Webhook ingestion status
254+
- **Rate limiting**: Automatic backoff notifications
255+
256+
257+
## Comparison with Azure exporter
258+
259+
| Feature | Azure Exporter | Incremental Sync Integration |
260+
|---------|------------------------------|------------------------------|
261+
| **Architecture** | Ocean-based integration | Standalone Python application |
262+
| **APIs Used** | **Azure Resource Manager (ARM) REST API** + **Event Grid** | **Azure Resource Graph (ARG)** (`resources`, `resourcechanges`, `resourcecontainerchanges`) |
263+
| **Schema Depth** | **Complete schema**: full set of fields from ARM APIs. | **Partial schema**: limited to fields exposed by ARG tables. |
264+
| **Deployment** | Helm, Docker, ContainerApp | GitHub Actions, local execution. |
265+
| **Change Detection** | Event Grid, full rescans | Azure Resource Graph change history. |
266+
| **Real-time Updates** | Yes (**Event Grid, Terraform only**) | Near real-time (configurable polling). |
267+
| **Resource Usage** | Higher (full resource scanning) | Lower (change-based detection). |
268+
| **Setup Complexity** | Medium (Ocean integration) | Low (standalone app). |
269+
270+
## Next steps
271+
272+
1. **Review the [README](https://github.com/port-labs/incremental-sync)** for complete setup instructions.
273+
2. **Set up Azure app registration** with required permissions.
274+
3. **Create Port blueprints** for Azure resources.
275+
4. **Configure webhook mapping** for resource ingestion.
276+
5. **Deploy via GitHub Actions** or run locally for testing.
277+
278+
:::tip Best practice
279+
Start with incremental sync for ongoing operations and use full sync only for initial onboarding or when you need to ensure complete data consistency.
280+
:::

0 commit comments

Comments
 (0)