You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Data Loader is a built-in component that provides a reliable mechanism for loading data from JSON or YAML files into Harper tables as part of component deployment. This feature is particularly useful for ensuring specific records exist in your database when deploying components, such as seed data, configuration records, or initial application data.
7
+
Now that you’ve set up your first application, let’s bring it to life with some data. Applications are only as useful as the information they hold, and Harper makes it simple to seed your database with initial records, configuration values, or even test users, without needing to write a custom script. This is where the Data Loader comes in.
8
8
9
-
## Configuration
9
+
Think of the Data Loader as your shortcut for putting essential data in place from day one. Whether it’s a set of default settings, an admin user account, or sample data for development, the Data Loader ensures that when your application is deployed, it’s immediately usable.
10
10
11
-
To use the Data Loader, first specify your data files in the `config.yaml` in your component directory:
11
+
In this section, we’ll add a few dogs to our `Dog` table so our application starts with meaningful data.
12
12
13
-
```yaml
14
-
dataLoader:
15
-
files: 'data/*.json'
16
-
```
17
-
18
-
The Data Loader is an [Extension](../../reference/components#extensions) and supports the standard `files` configuration option.
13
+
## Creating a Data File
19
14
20
-
## Data File Format
21
-
22
-
Data files can be structured as either JSON or YAML files containing the records you want to load. Each data file must specify records for a single table - if you need to load data into multiple tables, create separate data files for each table.
23
-
24
-
### Basic Example
25
-
26
-
Create a data file in your component's data directory (one table per file):
15
+
First, let’s make a `data` directory in our app and create a file called `dogs.json`:
27
16
28
17
```json
29
18
{
30
19
"database": "myapp",
31
-
"table": "users",
20
+
"table": "Dog",
32
21
"records": [
33
22
{
34
23
"id": 1,
35
-
"username": "admin",
36
-
"email": "admin@example.com",
37
-
"role": "administrator"
24
+
"name": "Harper",
25
+
"breed": "Labrador",
26
+
"age": 3,
27
+
"tricks": ["sit"]
38
28
},
39
29
{
40
30
"id": 2,
41
-
"username": "user1",
42
-
"email": "user1@example.com",
43
-
"role": "standard"
31
+
"name": "Balto",
32
+
"breed": "Husky",
33
+
"age": 5,
34
+
"tricks": ["run", "pull sled"]
44
35
}
45
36
]
46
37
}
47
38
```
48
39
49
-
### Multiple Tables
40
+
This file tells Harper: _“Insert these two records into the `Dog` table when this app runs.”_
50
41
51
-
To load data into multiple tables, create separate data files for each table:
42
+
## Connecting the Data Loader
52
43
53
-
**users.json:**
54
-
55
-
```json
56
-
{
57
-
"database": "myapp",
58
-
"table": "users",
59
-
"records": [
60
-
{
61
-
"id": 1,
62
-
"username": "admin",
63
-
"email": "admin@example.com"
64
-
}
65
-
]
66
-
}
67
-
```
68
-
69
-
**settings.yaml:**
70
-
71
-
```yaml
72
-
database: myapp
73
-
table: settings
74
-
records:
75
-
- id: 1
76
-
setting_name: app_name
77
-
setting_value: My Application
78
-
- id: 2
79
-
setting_name: version
80
-
setting_value: '1.0.0'
81
-
```
82
-
83
-
## File Organization
84
-
85
-
You can organize your data files in various ways:
86
-
87
-
### Single File Pattern
44
+
Next, let’s tell Harper to use this file when running the application. Open `config.yaml` in the root of your project and add:
88
45
89
46
```yaml
90
47
dataLoader:
91
-
files: 'data/seed-data.json'
48
+
files: 'data/dogs.json'
92
49
```
93
50
94
-
### Multiple Files Pattern
51
+
That’s it. Now the Data Loader knows where to look.
95
52
96
-
```yaml
97
-
dataLoader:
98
-
files:
99
-
- 'data/users.json'
100
-
- 'data/settings.yaml'
101
-
- 'data/initial-products.json'
102
-
```
53
+
## Running with Data
103
54
104
-
### Glob Pattern
55
+
Go ahead and start your app again:
105
56
106
-
```yaml
107
-
dataLoader:
108
-
files: 'data/**/*.{json,yaml,yml}'
57
+
```bash
58
+
harperdb dev .
109
59
```
110
60
111
-
## Loading Behavior
61
+
This time, when Harper runs, it will automatically read `dogs.json` and load the records into the Dog table. You don’t need to write any import scripts or SQL statements, it just works.
112
62
113
-
When Harper starts up with a component that includes the Data Loader:
63
+
You can confirm the data is there by hitting the endpoint you created earlier:
114
64
115
-
1. The Data Loader reads all specified data files (JSON or YAML)
116
-
1. For each file, it validates that a single table is specified
117
-
1. Records are inserted or updated based on timestamp comparison:
118
-
- New records are inserted if they don't exist
119
-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120
-
- This ensures data files can be safely reloaded without overwriting newer changes
121
-
1. If records with the same primary key already exist, updates occur only when the file is newer
122
-
123
-
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
124
-
125
-
## Best Practices
65
+
```bash
66
+
curl http://localhost:9926/Dog/
67
+
```
126
68
127
-
1. **Define Schemas First**: While the Data Loader can infer schemas, it's strongly recommended to define your table schemas and relations explicitly using the [graphqlSchema](../applications/defining-schemas) component before loading data. This ensures proper data types, constraints, and relationships between tables.
69
+
You should see both `Harper`and `Balto` returned as JSON.
128
70
129
-
1. **One Table Per File**: Remember that each data file can only load records into a single table. Organize your files accordingly.
71
+
### Updating Records
130
72
131
-
1. **Idempotency**: Design your data files to be idempotent - they should be safe to load multiple times without creating duplicate or conflicting data.
73
+
What happens if you change the data file? Let’s update Harper’s age from 3 to 4 in `dogs.json.`
132
74
133
-
1. **Version Control**: Include your data files in version control to ensure consistency across deployments.
75
+
```json
76
+
{
77
+
"id": 1,
78
+
"name": "Harper",
79
+
"breed": "Labrador",
80
+
"age": 4,
81
+
"tricks": ["sit"]
82
+
}
83
+
```
134
84
135
-
1. **Environment-Specific Data**: Consider using different data files for different environments (development, staging, production).
85
+
When you save the file, Harper will notice the change and reload. The next time you query the endpoint, Harper’s age will be updated.
136
86
137
-
1. **Data Validation**: Ensure your data files are valid JSON or YAML and match your table schemas before deployment.
87
+
The Data Loader is designed to be safe and repeatable. If a record already exists, it will only update when the file is newer than the record. This means you can re-run deployments without worrying about duplicates.
138
88
139
-
1. **Sensitive Data**: Avoid including sensitive data like passwords or API keys directly in data files. Use environment variables or secure configuration management instead.
89
+
### Adding More Tables
140
90
141
-
## Example Component Structure
91
+
If your app grows and you want to seed more than just dogs, you can create additional files. For example, a `settings.yaml` file:
142
92
143
-
```
144
-
my-component/
145
-
├── config.yaml
146
-
├── data/
147
-
│ ├── users.json
148
-
│ ├── roles.json
149
-
│ └── settings.json
150
-
├── schemas.graphql
151
-
└── roles.yaml
93
+
```yaml
94
+
database: myapp
95
+
table: Settings
96
+
records:
97
+
- id: 1
98
+
setting_name: app_name
99
+
setting_value: Dog Tracker
100
+
- id: 2
101
+
setting_name: version
102
+
setting_value: '1.0.0'
152
103
```
153
104
154
-
With this structure, your `config.yaml` might look like:
105
+
Then add it to your config:
155
106
156
107
```yaml
157
-
# Load environment variables first
158
-
loadEnv:
159
-
files: '.env'
108
+
dataLoader:
109
+
files:
110
+
- 'data/dogs.json'
111
+
- 'data/settings.yaml'
112
+
```
160
113
161
-
# Define schemas
162
-
graphqlSchema:
163
-
files: 'schemas.graphql'
114
+
Harper will read both files and load them into their respective tables.
164
115
165
-
# Define roles
166
-
roles:
167
-
files: 'roles.yaml'
116
+
## Key Takeaway
168
117
169
-
# Load initial data
170
-
dataLoader:
171
-
files: 'data/*.json'
118
+
With the Data Loader, your app doesn’t start empty. It starts ready to use. You define your schema, write a simple data file, and Harper takes care of loading it. This keeps your applications consistent across environments, safe to redeploy, and quick to get started with.
172
119
173
-
# Enable REST endpoints
174
-
rest: true
175
-
```
120
+
In just a few steps, we’ve gone from an empty Dog table to a real application with data that’s instantly queryable.
0 commit comments