Skip to content

Commit d657346

Browse files
committed
Create new data loader page
1 parent 11c6865 commit d657346

File tree

2 files changed

+244
-121
lines changed

2 files changed

+244
-121
lines changed

versioned_docs/version-4.6/developers/applications/data-loader.md

Lines changed: 65 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -4,178 +4,122 @@ title: Data Loader
44

55
# Data Loader
66

7-
The Data Loader is a built-in component that provides a reliable mechanism for loading data from JSON or YAML files into Harper tables as part of component deployment. This feature is particularly useful for ensuring specific records exist in your database when deploying components, such as seed data, configuration records, or initial application data.
7+
Now that you’ve set up your first application, let’s bring it to life with some data. Applications are only as useful as the information they hold, and Harper makes it simple to seed your database with initial records, configuration values, or even test users, without needing to write a custom script. This is where the Data Loader comes in.
88

9-
## Configuration
9+
Think of the Data Loader as your shortcut for putting essential data in place from day one. Whether it’s a set of default settings, an admin user account, or sample data for development, the Data Loader ensures that when your application is deployed, it’s immediately usable.
1010

11-
To use the Data Loader, first specify your data files in the `config.yaml` in your component directory:
11+
In this section, we’ll add a few dogs to our `Dog` table so our application starts with meaningful data.
1212

13-
```yaml
14-
dataLoader:
15-
files: 'data/*.json'
16-
```
17-
18-
The Data Loader is an [Extension](../../reference/components#extensions) and supports the standard `files` configuration option.
13+
## Creating a Data File
1914

20-
## Data File Format
21-
22-
Data files can be structured as either JSON or YAML files containing the records you want to load. Each data file must specify records for a single table - if you need to load data into multiple tables, create separate data files for each table.
23-
24-
### Basic Example
25-
26-
Create a data file in your component's data directory (one table per file):
15+
First, let’s make a `data` directory in our app and create a file called `dogs.json`:
2716

2817
```json
2918
{
3019
"database": "myapp",
31-
"table": "users",
20+
"table": "Dog",
3221
"records": [
3322
{
3423
"id": 1,
35-
"username": "admin",
36-
"email": "admin@example.com",
37-
"role": "administrator"
24+
"name": "Harper",
25+
"breed": "Labrador",
26+
"age": 3,
27+
"tricks": ["sit"]
3828
},
3929
{
4030
"id": 2,
41-
"username": "user1",
42-
"email": "user1@example.com",
43-
"role": "standard"
31+
"name": "Balto",
32+
"breed": "Husky",
33+
"age": 5,
34+
"tricks": ["run", "pull sled"]
4435
}
4536
]
4637
}
4738
```
4839

49-
### Multiple Tables
40+
This file tells Harper: _“Insert these two records into the `Dog` table when this app runs.”_
5041

51-
To load data into multiple tables, create separate data files for each table:
42+
## Connecting the Data Loader
5243

53-
**users.json:**
54-
55-
```json
56-
{
57-
"database": "myapp",
58-
"table": "users",
59-
"records": [
60-
{
61-
"id": 1,
62-
"username": "admin",
63-
"email": "admin@example.com"
64-
}
65-
]
66-
}
67-
```
68-
69-
**settings.yaml:**
70-
71-
```yaml
72-
database: myapp
73-
table: settings
74-
records:
75-
- id: 1
76-
setting_name: app_name
77-
setting_value: My Application
78-
- id: 2
79-
setting_name: version
80-
setting_value: '1.0.0'
81-
```
82-
83-
## File Organization
84-
85-
You can organize your data files in various ways:
86-
87-
### Single File Pattern
44+
Next, let’s tell Harper to use this file when running the application. Open `config.yaml` in the root of your project and add:
8845

8946
```yaml
9047
dataLoader:
91-
files: 'data/seed-data.json'
48+
files: 'data/dogs.json'
9249
```
9350
94-
### Multiple Files Pattern
51+
That’s it. Now the Data Loader knows where to look.
9552
96-
```yaml
97-
dataLoader:
98-
files:
99-
- 'data/users.json'
100-
- 'data/settings.yaml'
101-
- 'data/initial-products.json'
102-
```
53+
## Running with Data
10354
104-
### Glob Pattern
55+
Go ahead and start your app again:
10556
106-
```yaml
107-
dataLoader:
108-
files: 'data/**/*.{json,yaml,yml}'
57+
```bash
58+
harperdb dev .
10959
```
11060

111-
## Loading Behavior
61+
This time, when Harper runs, it will automatically read `dogs.json` and load the records into the Dog table. You don’t need to write any import scripts or SQL statements, it just works.
11262

113-
When Harper starts up with a component that includes the Data Loader:
63+
You can confirm the data is there by hitting the endpoint you created earlier:
11464

115-
1. The Data Loader reads all specified data files (JSON or YAML)
116-
1. For each file, it validates that a single table is specified
117-
1. Records are inserted or updated based on timestamp comparison:
118-
- New records are inserted if they don't exist
119-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120-
- This ensures data files can be safely reloaded without overwriting newer changes
121-
1. If records with the same primary key already exist, updates occur only when the file is newer
122-
123-
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
124-
125-
## Best Practices
65+
```bash
66+
curl http://localhost:9926/Dog/
67+
```
12668

127-
1. **Define Schemas First**: While the Data Loader can infer schemas, it's strongly recommended to define your table schemas and relations explicitly using the [graphqlSchema](../applications/defining-schemas) component before loading data. This ensures proper data types, constraints, and relationships between tables.
69+
You should see both `Harper` and `Balto` returned as JSON.
12870

129-
1. **One Table Per File**: Remember that each data file can only load records into a single table. Organize your files accordingly.
71+
### Updating Records
13072

131-
1. **Idempotency**: Design your data files to be idempotent - they should be safe to load multiple times without creating duplicate or conflicting data.
73+
What happens if you change the data file? Let’s update Harper’s age from 3 to 4 in `dogs.json.`
13274

133-
1. **Version Control**: Include your data files in version control to ensure consistency across deployments.
75+
```json
76+
{
77+
"id": 1,
78+
"name": "Harper",
79+
"breed": "Labrador",
80+
"age": 4,
81+
"tricks": ["sit"]
82+
}
83+
```
13484

135-
1. **Environment-Specific Data**: Consider using different data files for different environments (development, staging, production).
85+
When you save the file, Harper will notice the change and reload. The next time you query the endpoint, Harper’s age will be updated.
13686

137-
1. **Data Validation**: Ensure your data files are valid JSON or YAML and match your table schemas before deployment.
87+
The Data Loader is designed to be safe and repeatable. If a record already exists, it will only update when the file is newer than the record. This means you can re-run deployments without worrying about duplicates.
13888

139-
1. **Sensitive Data**: Avoid including sensitive data like passwords or API keys directly in data files. Use environment variables or secure configuration management instead.
89+
### Adding More Tables
14090

141-
## Example Component Structure
91+
If your app grows and you want to seed more than just dogs, you can create additional files. For example, a `settings.yaml` file:
14292

143-
```
144-
my-component/
145-
├── config.yaml
146-
├── data/
147-
│ ├── users.json
148-
│ ├── roles.json
149-
│ └── settings.json
150-
├── schemas.graphql
151-
└── roles.yaml
93+
```yaml
94+
database: myapp
95+
table: Settings
96+
records:
97+
- id: 1
98+
setting_name: app_name
99+
setting_value: Dog Tracker
100+
- id: 2
101+
setting_name: version
102+
setting_value: '1.0.0'
152103
```
153104
154-
With this structure, your `config.yaml` might look like:
105+
Then add it to your config:
155106
156107
```yaml
157-
# Load environment variables first
158-
loadEnv:
159-
files: '.env'
108+
dataLoader:
109+
files:
110+
- 'data/dogs.json'
111+
- 'data/settings.yaml'
112+
```
160113
161-
# Define schemas
162-
graphqlSchema:
163-
files: 'schemas.graphql'
114+
Harper will read both files and load them into their respective tables.
164115
165-
# Define roles
166-
roles:
167-
files: 'roles.yaml'
116+
## Key Takeaway
168117
169-
# Load initial data
170-
dataLoader:
171-
files: 'data/*.json'
118+
With the Data Loader, your app doesn’t start empty. It starts ready to use. You define your schema, write a simple data file, and Harper takes care of loading it. This keeps your applications consistent across environments, safe to redeploy, and quick to get started with.
172119
173-
# Enable REST endpoints
174-
rest: true
175-
```
120+
In just a few steps, we’ve gone from an empty Dog table to a real application with data that’s instantly queryable.
176121
177122
## Related Documentation
178123
179-
- [Built-In Components](../../reference/components/built-in-extensions)
180-
- [Extensions](../../reference/components/extensions)
124+
- [Data Loader Reference](../../reference/applications/data-loader) – Complete configuration and format options.
181125
- [Bulk Operations](../operations-api/bulk-operations) - For loading data via the Operations API

0 commit comments

Comments
 (0)