Skip to content

Commit 19fe1ef

Browse files
authored
chore: deprecate /api/git endpoint used by legacy git-integration (CM-767) (#3567)
1 parent 78aa333 commit 19fe1ef

File tree

3 files changed

+2
-151
lines changed

3 files changed

+2
-151
lines changed

backend/src/api/integration/helpers/gitGetRemotes.ts

Lines changed: 0 additions & 31 deletions
This file was deleted.

backend/src/api/integration/index.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,6 @@ export default (app) => {
7272

7373
// Git
7474
app.put(`/git-connect`, safeWrap(require('./helpers/gitAuthenticate').default))
75-
app.get('/git', safeWrap(require('./helpers/gitGetRemotes').default))
7675
app.put(`/confluence-connect`, safeWrap(require('./helpers/confluenceAuthenticate').default))
7776
app.put(`/gerrit-connect`, safeWrap(require('./helpers/gerritAuthenticate').default))
7877
app.get('/devto-validate', safeWrap(require('./helpers/devtoValidators').default))
Lines changed: 2 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -1,120 +1,3 @@
1-
# CM Git integration - OUTDATED and need to be updated!
1+
# Git Integration V2
22

3-
The Git integration differs from most other integrations because the data is local-first. The goal is to get contributor information from commits in a repo.
4-
The way it is configured, there is only one Git integration allowed per deployment. This is enough since LF has its deployment.
5-
The Git integration lives in its own EC2 instance. It will get a set of remotes, clone them, parse activities and members from them, and send them back to crowd.dev. Because we need to use our queue systems to ingest data, the instance must have access to the VPC where the cluster lives.
6-
7-
## Getting started
8-
9-
### Environment
10-
There are some environment variables needed for this integration to work. They are stored in the [Git integration environment repo](https://github.com/CrowdDotDev/git-integration-environment/tree/main) for staging and production.
11-
12-
Expected environment variables:
13-
14-
- `CROWD_HOST`: the URL to use to send requests to crowd.dev.
15-
- `TENANT_ID`: the tenant in crowd.dev that will have the integration.
16-
- `CROWD_API_KEY`: the API key for the user in crowd.dev setting up the integration.
17-
- `SQS_ENDPOINT_URL`: the endpoint URL to send messages for ingesting activities.
18-
- `SQS_REGION`: the region in which the SQS queue lives.
19-
- `SQS_SECRET_ACCESS_KEY`: secret access key for the account that has the SQS.
20-
- `SQS_ACCESS_KEY_ID`: the id for the account that has the SQS.
21-
22-
23-
### Install
24-
25-
```
26-
mkdir ~/venv/cgit && python -m venv ~/venv/cgit
27-
source ~/venv/cgit/bin/activate
28-
pip install --upgrade pip
29-
pip install -e .
30-
pip install ".[dev]"
31-
```
32-
33-
## The integration
34-
35-
### Getting remotes
36-
37-
The remotes, which are the repos to clone, come from the database. We set up a normal integration that stores the remotes in the `settings`. This can be done through the UI.
38-
39-
To get remotes, we can do so with a simple request:
40-
```
41-
import requests
42-
43-
url = "{host}/api/tenant/{TENANT_ID}/git"
44-
45-
payload = {}
46-
headers = {
47-
'Authorization': 'Bearer {CROWD_API_KEY}'
48-
}
49-
50-
response = requests.request("GET", url, headers=headers, data=payload)
51-
52-
print(response.text)
53-
```
54-
55-
### What data do we get from a commit?
56-
57-
A commit can have multiple activities within. For example, this is how a fairly complex commit might look:
58-
59-
- **Hash**: `7b50567bdcad8925ca1e075feb7171c12015afd1`
60-
- **Author**
61-
- **Name**: `Arnd Bergmann`
62-
- **Email**: `arnd@arndb.de`
63-
- **Date**: `2023-02-07 17:13:12+01:00`
64-
- **Committer**:
65-
- **Name**: `Linus Torvalds`
66-
- **Email**: `torvalds@linux-foundation.org`
67-
- **Date**: `2023-03-31 16:10:04-07:00`
68-
- **Message:**
69-
```
70-
*Body here…
71-
Signed-off-by: Arnd Bergmann arnd@arndb.de
72-
Reported-by: Guenter Roeck linux@roeck-us.net
73-
Reported-by: Sudip Mukherjee sudipm.mukherjee@gmail.com
74-
Reviewed-by: Manivannan Sadhasivam mani@kernel.org
75-
Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com
76-
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
77-
```
78-
79-
### The integration's flow
80-
The integration runs every hour; there is a cron job set up.
81-
82-
- Retrieve the required remotes from crowd.dev. For each remote:
83-
- Use a semaphore to check if parsing for a repository is in progress:
84-
- If not running, proceed with the process.
85-
- If running, skip to the following repository.
86-
- Set the repository semaphore to "running."
87-
- Update an existing cloned repository by pulling new commits or clone it to get all commits if it's not already cloned.
88-
- Process each commit:
89-
- Extract and save activities and members from the commit to a list.
90-
- If the repository is a GitHub repository, attempt to fetch the contributor's GitHub information based on the commit's SHA.
91-
- With a list containing activities and members:
92-
- Split the list into chunks and forward them to the nodejs_worker for ingestion via SQS.
93-
- Remove the semaphore from the repository.
94-
95-
96-
### File breakdown
97-
98-
- `get_remotes.py`: it gets a list of all the repository remotes we need in the integration.
99-
- `repo.py`: performs several functions related to repos. Clones, extracts commits (and new commits since a date), gets insertions and deletions for a commit...
100-
- `activity.py`: gets the activities that we need from a commit. It uses the activitymap.py file as a helper.
101-
- `ingest.py`: this is the main controller file. It gets the remotes, ensures the repos are cloned, gets new activities from the commits, and sends SQS messages for ingestions.
102-
103-
## Deployment and remote access
104-
105-
### Accessing the instances
106-
The instances are easily accessible through SSH. Contact [Joan](mailto:joan@crowd.dev) for credentials.
107-
108-
### Deploying new versions
109-
110-
To deploy new versions, ssh into the instance, check out the `git-integration` directory, and pull. If you need to update environment variables, once you are inside the `git-integration` directory do:
111-
112-
- `./install.sh` if you are in staging
113-
- `./install.sh prod` if you are in production
114-
115-
## Scripts
116-
117-
There are two useful scripts to re-onboard repos. These do not live in this repository, they are in the root of the production instance (for now).
118-
119-
- `reonboard.sh`: Given a remote URL, it will perform a full re-onboard of the commits for that repository. This always starts from scratch, because it deletes and re-clones the repo.
120-
- `reonboard-all.sh`: It will re-onboard all repositories. This only performs a full re-onboard for the repos that did not exist in the instance already. Therefore, if we have to stop the script halfway through for some reason, we can re-start it and it without a problem.
3+
The Git integration is a Kubernetes-based service that processes Git repositories to extract contributor information from commits. It runs as worker pods that acquire repositories from the database queue, clone and process them, extract commits and maintainers, and send the processed data to Kafka for downstream ingestion.

0 commit comments

Comments
 (0)