|
1 | 1 | # Topcoder Resources Data Migration Tool |
2 | | - |
| 2 | + |
3 | 3 | This tool is designed to **migrate data from DynamoDB (JSON format) to PostgreSQL** using **Prisma ORM**. It covers five key models of the Topcoder Resources API: |
4 | | - |
5 | | - - `MemberProfile` |
6 | | - - `MemberStats` |
| 4 | + |
7 | 5 | - `ResourceRole` |
8 | 6 | - `ResourceRolePhaseDependency` |
9 | 7 | - `Resource` |
10 | | - |
| 8 | + |
11 | 9 | ## 📦 Technologies Used |
12 | 10 | - **Node.js** (backend scripting) |
13 | 11 | - **Prisma ORM** (PostgreSQL schema management) |
14 | 12 | - **PostgreSQL 16.3** (Dockerized database) |
15 | 13 | - **Docker & Docker Compose** (for DB setup) |
16 | 14 | - **stream-json / readline** (for streaming JSON migration) |
17 | 15 | - **Jest** (unit testing framework) |
18 | | - |
| 16 | + |
19 | 17 | ## ⚙️ Environment Configuration |
20 | 18 | Create a `.env` file in the root directory: |
21 | | - |
| 19 | + |
22 | 20 | ```env |
23 | 21 | DATABASE_URL="postgresql://postgres:postgres@localhost:5432/resourcesdb" |
24 | 22 | CREATED_BY="resources-api-db-migration" |
25 | 23 | ``` |
26 | | - |
| 24 | + |
27 | 25 | > The `CREATED_BY` field can be overridden at runtime: |
28 | 26 | ```bash |
29 | 27 | CREATED_BY=eduardo node src/index.js member-stats ./data/MemberStats_test.json |
30 | 28 | ``` |
31 | | - |
| 29 | + |
32 | 30 | ## 🚀 How to Run |
33 | | - |
| 31 | + |
34 | 32 | This tool expects a running PostgreSQL instance defined in `docker-compose.yml`. |
35 | | - |
| 33 | + |
36 | 34 | 1. Clone the repo and install dependencies: |
37 | | - |
| 35 | + |
38 | 36 | ```bash |
39 | 37 | npm install |
40 | 38 | ``` |
41 | | - |
| 39 | + |
42 | 40 | 2. Start PostgreSQL with Docker Compose: |
43 | | - |
| 41 | + |
44 | 42 | ```bash |
45 | 43 | docker-compose up -d |
46 | 44 | ``` |
47 | | - |
| 45 | + |
48 | 46 | To tear it down completely (including the volume): |
49 | | - |
| 47 | + |
50 | 48 | ```bash |
51 | 49 | docker-compose down -v |
52 | 50 | ``` |
53 | | - |
| 51 | + |
54 | 52 | > The database runs on port `5432` with credentials `postgres:postgres`, and is mapped to `resourcesdb`. |
55 | | - |
| 53 | +
|
56 | 54 | 3. Push the Prisma schema to the database: |
57 | | - |
| 55 | + |
58 | 56 | ```bash |
59 | 57 | npx prisma db push |
60 | 58 | ``` |
61 | | - |
| 59 | + |
62 | 60 | 4. Run a migration step (with optional file override): |
63 | | - |
| 61 | + |
64 | 62 | ```bash |
65 | 63 | node src/index.js member-stats |
66 | 64 | node src/index.js resources ./data/challenge-api.resources.json |
67 | 65 | ``` |
68 | | - |
| 66 | + |
69 | 67 | You can override the default `createdBy` value: |
70 | | - |
| 68 | + |
71 | 69 | ```bash |
72 | 70 | CREATED_BY=my-migrator node src/index.js member-profiles |
73 | 71 | ``` |
74 | | - |
| 72 | + |
75 | 73 | ## 🧩 Available Migration Steps |
76 | | - |
| 74 | + |
77 | 75 | | Step | Auto Strategy | Description | |
78 | 76 | |-------------------------------------|---------------|---------------------------------------------------------------------------------------------------| |
79 | 77 | | `member-profiles` | ✅ | Auto strategy: uses `stream-json` (batch) for files larger than 3MB, and `loadJSON` (simple) otherwise | |
|
89 | 87 | > - For **smaller files**, it defaults to **simple in-memory processing** (`loadJSON`) for faster performance. |
90 | 88 | > |
91 | 89 | > This approach ensures optimal balance between **efficiency** and **stability**, especially when working with hundreds of thousands of records (e.g., over 850,000 for MemberProfile). |
92 | | - |
| 90 | +
|
93 | 91 | ### 📁 Default Input Files per Migration Step |
94 | | - |
| 92 | + |
95 | 93 | The following files are used by default for each step, unless a custom path is provided via the CLI: |
96 | | - |
| 94 | + |
97 | 95 | | Step | Default File Path | |
98 | 96 | |-------------------------------------|----------------------------------------------------------------| |
99 | | - | `member-profiles` | `./data/MemberProfile_dynamo_data.json` | |
100 | | - | `member-stats` | `./data/MemberStats_dynamo_data.json` | |
101 | 97 | | `resource-roles` | `./data/ResourceRole_dynamo_data.json` | |
102 | 98 | | `resource-role-phase-dependencies` | `./data/ResourceRolePhaseDependency_dynamo_data.json` | |
103 | 99 | | `resources` | `./data/Resource_data.json` ← requires NDJSON format | |
104 | | - |
| 100 | + |
105 | 101 | 💡 **Note:** If you're using the original ElasticSearch export file (`challenge-api.resources.json`) provided in the forum ([link here](https://drive.google.com/file/d/1F8YW-fnKjn8tt5a0_Z-QenZIHPiP3RK7/view?usp=sharing)), you must explicitly provide its path when running the migration: |
106 | | - |
| 102 | + |
107 | 103 | ```bash |
108 | 104 | node src/index.js resources ./data/challenge-api.resources.json |
109 | 105 | ``` |
110 | | - |
| 106 | + |
| 107 | +### 🔁 Run All Migrations at Once |
| 108 | + |
| 109 | +You can now run all migration steps sequentially using the following script: |
| 110 | + |
| 111 | +```bash |
| 112 | +node src/migrateAll.js |
| 113 | +``` |
| 114 | + |
| 115 | +This script will automatically execute each step in order (`resource-roles`, `resource-role-phase-dependencies`, `resources`), logging progress and duration for each. Ideal for full dataset migration in one command. |
| 116 | + |
111 | 117 | ## 📒 Error Logs |
112 | 118 | All failed migrations are logged under the `logs/` folder by model: |
113 | | - |
114 | | - - `logs/memberprofile_errors.log` ← from `MemberProfile_dynamo_data.json` *(7 migrations failed)* |
115 | | - - `logs/memberstats_errors.log` ← from `MemberStats_dynamo_data.json` *(1 migration failed)* |
| 119 | + |
116 | 120 | - `logs/rrpd_errors.log` ← from `ResourceRolePhaseDependency_dynamo_data.json` *(17 migrations failed)* |
117 | | - |
| 121 | + |
118 | 122 | > ✅ Most migrations complete successfully. Errors are logged for further review and debugging. |
119 | | - |
| 123 | +
|
120 | 124 | ## ✅ Verification |
121 | 125 | You can verify successful migration with simple SQL queries, for example: |
122 | 126 | ```sql |
123 | | - SELECT COUNT(*) FROM "MemberProfile"; |
124 | 127 | SELECT COUNT(*) FROM "Resource"; |
125 | 128 | ``` |
126 | 129 | To connect: |
127 | 130 | ```bash |
128 | 131 | docker exec -it resources_postgres psql -U postgres -d resourcesdb |
129 | 132 | ``` |
130 | | - |
| 133 | + |
131 | 134 | ## 📸 Screenshots |
132 | 135 | See `/docs/` for evidence of a fully mounted database. |
133 | 136 |  |
134 | | - |
| 137 | + |
135 | 138 | ## 🧪 Testing |
136 | | - |
| 139 | + |
137 | 140 | Run all test suites with: |
138 | | - |
| 141 | + |
139 | 142 | ```bash |
140 | 143 | npm test |
141 | 144 | ``` |
142 | | - |
| 145 | + |
143 | 146 | Each migrator has a corresponding unit test with mock input files under `src/test/mocks/`. Jest is used as the testing framework. |
144 | | - |
| 147 | + |
145 | 148 | --- |
146 | | - |
| 149 | + |
147 | 150 | ### 📂 Data Files Not Included |
148 | 151 |
|
149 | | -The official DynamoDB dataset files provided in the forum (e.g., `MemberProfile_dynamo_data.json`, `challenge-api.resources.json`, etc.) are **not included** in this submission due to size constraints. |
| 152 | +The official DynamoDB dataset files provided in the forum (e.g., `challenge-api.resources.json`, etc.) are **not included** in this submission due to size constraints. |
150 | 153 |
|
151 | 154 | Please download them manually from the official challenge forum and place them under the `/data/` directory. |
152 | 155 |
|
153 | 156 | 🔗 [Official Data Files (Google Drive)](https://drive.google.com/file/d/1F8YW-fnKjn8tt5a0_Z-QenZIHPiP3RK7/view?usp=sharing) |
154 | 157 |
|
155 | 158 | > 🧪 This project **includes lightweight mock data files** under `src/test/mocks/` for testing purposes and sample execution. Full data is only required for production migration. |
156 | | - |
| 159 | +
|
157 | 160 | --- |
158 | | - |
| 161 | + |
159 | 162 | ✅ All requirements of the challenge have been implemented, including logs, unit tests, schema adherence, and configurability. |
160 | | - |
| 163 | + |
| 164 | + |
| 165 | +## 🔧 Integrated Fixes & Enhancements |
| 166 | + |
| 167 | +Several improvements and refinements have been implemented throughout the migration tool to ensure performance, reliability, and clarity: |
| 168 | + |
| 169 | +### ✅ Progress Bar for Batch Processes |
| 170 | + |
| 171 | +A custom CLI progress bar was added using `utils/progressLogger.js`. This applies only to **batch-based migrations**, and provides a visual representation of migration progress based on the total number of records or batches processed: |
| 172 | +- Implemented for: `resources` |
| 173 | +- Skipped for small or in-memory migrations like `resource-roles` and `resource-role-phase-dependencies` |
| 174 | + |
| 175 | +### ✅ Validation for All Models |
| 176 | + |
| 177 | +Additional validation scripts were also developed for: |
| 178 | +- `Resource` |
| 179 | +- `ResourceRole` |
| 180 | +- `ResourceRolePhaseDependency` |
| 181 | + |
| 182 | +While binary search was not applicable for these due to non-numeric or unordered IDs, the validation was still efficiently implemented using `Map`-based lookups with the `id` as the key. |
| 183 | + |
| 184 | +### ✅ Cleaner Code & Utility Reuse |
| 185 | + |
| 186 | +A reusable utility module `utils/batchMigrator.js` was created to consolidate the logic for: |
| 187 | +- Streamed reading of large JSON and NDJSON files |
| 188 | +- Batch-based record processing with `Promise.allSettled` |
| 189 | +- Progress tracking and error logging |
| 190 | +- Automatic detection of input format size |
| 191 | + |
| 192 | +This approach: |
| 193 | +- Avoids code duplication |
| 194 | +- Allows for consistent logging and error handling |
| 195 | +- Simplifies future extensions |
| 196 | + |
| 197 | +### ✅ Default Field Logic (createdAt, updatedAt, etc.) |
| 198 | + |
| 199 | +- Fields like `createdAt`, `updatedAt`, `createdBy`, and `updatedBy` are now conditionally set based on whether values exist in the original JSON. |
| 200 | +- If `updatedAt` or `updatedBy` are missing from the source, they are explicitly set to `null`, rather than omitted or auto-filled—ensuring data integrity. |
| 201 | + |
| 202 | +### ✅ FullAccess Compatibility Fix |
| 203 | + |
| 204 | +In `ResourceRole`, the original dataset sometimes includes only a `fullAccess` flag instead of `fullReadAccess` or `fullWriteAccess`. |
| 205 | + |
| 206 | +Logic was added to: |
| 207 | +- Derive `fullReadAccess` and `fullWriteAccess` from `fullAccess` when the specific fields are missing. |
| 208 | +- Ensure fallback to `.env` defaults only if neither are provided. |
| 209 | + |
| 210 | +```js |
| 211 | +const fullReadAccess = role.fullReadAccess ?? (role.fullAccess ?? DEFAULT_READ_ACCESS); |
| 212 | +const fullWriteAccess = role.fullWriteAccess ?? (role.fullAccess ?? DEFAULT_WRITE_ACCESS); |
| 213 | +``` |
| 214 | +
|
| 215 | +> 🚩 **Important Note:** Some records in the source data had `fullWriteAccess: true` but `fullReadAccess: false`, which is logically inconsistent. This was **not auto-corrected**, but a warning was added in the README for awareness during validation. |
| 216 | +
|
| 217 | +### 📄 Validation Logs |
| 218 | +
|
| 219 | +All validation scripts write their outputs and mismatches to `console.log`. You can redirect them to a file using: |
| 220 | +
|
| 221 | +```bash |
| 222 | +node src/validation/validateMemberProfiles.js > logs/memberprofile_validation.log |
| 223 | +``` |
| 224 | +
|
| 225 | +--- |
| 226 | +
|
| 227 | +
|
| 228 | +
|
0 commit comments