|
| 1 | + # Topcoder Resources Data Migration Tool |
| 2 | + |
| 3 | + This tool is designed to **migrate data from DynamoDB (JSON format) to PostgreSQL** using **Prisma ORM**. It covers five key models of the Topcoder Resources API: |
| 4 | + |
| 5 | + - `MemberProfile` |
| 6 | + - `MemberStats` |
| 7 | + - `ResourceRole` |
| 8 | + - `ResourceRolePhaseDependency` |
| 9 | + - `Resource` |
| 10 | + |
| 11 | + ## 📦 Technologies Used |
| 12 | + - **Node.js** (backend scripting) |
| 13 | + - **Prisma ORM** (PostgreSQL schema management) |
| 14 | + - **PostgreSQL 16.3** (Dockerized database) |
| 15 | + - **Docker & Docker Compose** (for DB setup) |
| 16 | + - **stream-json / readline** (for streaming JSON migration) |
| 17 | + - **Jest** (unit testing framework) |
| 18 | + |
| 19 | + ## ⚙️ Environment Configuration |
| 20 | + Create a `.env` file in the root directory: |
| 21 | + |
| 22 | + ```env |
| 23 | + DATABASE_URL="postgresql://postgres:postgres@localhost:5432/resourcesdb" |
| 24 | + CREATED_BY="resources-api-db-migration" |
| 25 | + ``` |
| 26 | + |
| 27 | + > The `CREATED_BY` field can be overridden at runtime: |
| 28 | + ```bash |
| 29 | + CREATED_BY=eduardo node src/index.js member-stats ./data/MemberStats_test.json |
| 30 | + ``` |
| 31 | + |
| 32 | + ## 🚀 How to Run |
| 33 | + |
| 34 | + This tool expects a running PostgreSQL instance defined in `docker-compose.yml`. |
| 35 | + |
| 36 | + 1. Clone the repo and install dependencies: |
| 37 | + |
| 38 | + ```bash |
| 39 | + npm install |
| 40 | + ``` |
| 41 | + |
| 42 | + 2. Start PostgreSQL with Docker Compose: |
| 43 | + |
| 44 | + ```bash |
| 45 | + docker-compose up -d |
| 46 | + ``` |
| 47 | + |
| 48 | + To tear it down completely (including the volume): |
| 49 | + |
| 50 | + ```bash |
| 51 | + docker-compose down -v |
| 52 | + ``` |
| 53 | + |
| 54 | + > The database runs on port `5432` with credentials `postgres:postgres`, and is mapped to `resourcesdb`. |
| 55 | + |
| 56 | + 3. Push the Prisma schema to the database: |
| 57 | + |
| 58 | + ```bash |
| 59 | + npx prisma db push |
| 60 | + ``` |
| 61 | + |
| 62 | + 4. Run a migration step (with optional file override): |
| 63 | + |
| 64 | + ```bash |
| 65 | + node src/index.js member-stats |
| 66 | + node src/index.js resources ./data/challenge-api.resources.json |
| 67 | + ``` |
| 68 | + |
| 69 | + You can override the default `createdBy` value: |
| 70 | + |
| 71 | + ```bash |
| 72 | + CREATED_BY=my-migrator node src/index.js member-profiles |
| 73 | + ``` |
| 74 | + |
| 75 | + ## 🧩 Available Migration Steps |
| 76 | + |
| 77 | + | Step | Auto Strategy | Description | |
| 78 | + |-------------------------------------|---------------|---------------------------------------------------------------------------------------------------| |
| 79 | + | `member-profiles` | ✅ | Auto strategy: uses `stream-json` (batch) for files larger than 3MB, and `loadJSON` (simple) otherwise | |
| 80 | + | `member-stats` | ✅ | Auto strategy: uses `stream-json` (batch) for files larger than 3MB, and `loadJSON` (simple) otherwise | |
| 81 | + | `resource-roles` | ❌ | Simple in-memory migration using `loadJSON`, not expected to be large | |
| 82 | + | `resource-role-phase-dependencies` | ❌ | Simple in-memory migration using `loadJSON`, not expected to be large | |
| 83 | + | `resources` | ✅ | Auto strategy for NDJSON files: uses `readline` + batch for files > 3 MB, otherwise simple line-by-line | |
| 84 | + |
| 85 | + > ⚙️ **Why Auto Strategy?** |
| 86 | +> |
| 87 | +> For models that involve large datasets (`member-profiles`, `member-stats`, and `resources`), the tool implements an **automatic selection strategy** based on file size: |
| 88 | +> - If the input file is **larger than 3 MB**, the migration runs in **batch mode using streaming (e.g., `stream-json` or `readline`)** to reduce memory usage. |
| 89 | +> - For **smaller files**, it defaults to **simple in-memory processing** (`loadJSON`) for faster performance. |
| 90 | +> |
| 91 | +> This approach ensures optimal balance between **efficiency** and **stability**, especially when working with hundreds of thousands of records (e.g., over 850,000 for MemberProfile). |
| 92 | + |
| 93 | + ### 📁 Default Input Files per Migration Step |
| 94 | + |
| 95 | + The following files are used by default for each step, unless a custom path is provided via the CLI: |
| 96 | + |
| 97 | + | Step | Default File Path | |
| 98 | + |-------------------------------------|----------------------------------------------------------------| |
| 99 | + | `member-profiles` | `./data/MemberProfile_dynamo_data.json` | |
| 100 | + | `member-stats` | `./data/MemberStats_dynamo_data.json` | |
| 101 | + | `resource-roles` | `./data/ResourceRole_dynamo_data.json` | |
| 102 | + | `resource-role-phase-dependencies` | `./data/ResourceRolePhaseDependency_dynamo_data.json` | |
| 103 | + | `resources` | `./data/Resource_data.json` ← requires NDJSON format | |
| 104 | + |
| 105 | + 💡 **Note:** If you're using the original ElasticSearch export file (`challenge-api.resources.json`) provided in the forum ([link here](https://drive.google.com/file/d/1F8YW-fnKjn8tt5a0_Z-QenZIHPiP3RK7/view?usp=sharing)), you must explicitly provide its path when running the migration: |
| 106 | + |
| 107 | + ```bash |
| 108 | + node src/index.js resources ./data/challenge-api.resources.json |
| 109 | + ``` |
| 110 | + |
| 111 | + ## 📒 Error Logs |
| 112 | + All failed migrations are logged under the `logs/` folder by model: |
| 113 | + |
| 114 | + - `logs/memberprofile_errors.log` ← from `MemberProfile_dynamo_data.json` *(7 migrations failed)* |
| 115 | + - `logs/memberstats_errors.log` ← from `MemberStats_dynamo_data.json` *(1 migration failed)* |
| 116 | + - `logs/rrpd_errors.log` ← from `ResourceRolePhaseDependency_dynamo_data.json` *(17 migrations failed)* |
| 117 | + |
| 118 | + > ✅ Most migrations complete successfully. Errors are logged for further review and debugging. |
| 119 | + |
| 120 | + ## ✅ Verification |
| 121 | + You can verify successful migration with simple SQL queries, for example: |
| 122 | + ```sql |
| 123 | + SELECT COUNT(*) FROM "MemberProfile"; |
| 124 | + SELECT COUNT(*) FROM "Resource"; |
| 125 | + ``` |
| 126 | + To connect: |
| 127 | + ```bash |
| 128 | + docker exec -it resources_postgres psql -U postgres -d resourcesdb |
| 129 | + ``` |
| 130 | + |
| 131 | + ## 📸 Screenshots |
| 132 | + See `/docs/` for evidence of a fully mounted database. |
| 133 | +  |
| 134 | + |
| 135 | + ## 🧪 Testing |
| 136 | + |
| 137 | + Run all test suites with: |
| 138 | + |
| 139 | + ```bash |
| 140 | + npm test |
| 141 | + ``` |
| 142 | + |
| 143 | + Each migrator has a corresponding unit test with mock input files under `src/test/mocks/`. Jest is used as the testing framework. |
| 144 | + |
| 145 | + --- |
| 146 | + |
| 147 | +### 📂 Data Files Not Included |
| 148 | + |
| 149 | +The official DynamoDB dataset files provided in the forum (e.g., `MemberProfile_dynamo_data.json`, `challenge-api.resources.json`, etc.) are **not included** in this submission due to size constraints. |
| 150 | + |
| 151 | +Please download them manually from the official challenge forum and place them under the `/data/` directory. |
| 152 | + |
| 153 | +🔗 [Official Data Files (Google Drive)](https://drive.google.com/file/d/1F8YW-fnKjn8tt5a0_Z-QenZIHPiP3RK7/view?usp=sharing) |
| 154 | + |
| 155 | +> 🧪 This project **includes lightweight mock data files** under `src/test/mocks/` for testing purposes and sample execution. Full data is only required for production migration. |
| 156 | + |
| 157 | + --- |
| 158 | + |
| 159 | + ✅ All requirements of the challenge have been implemented, including logs, unit tests, schema adherence, and configurability. |
| 160 | + |
0 commit comments