Skip to content

Commit ec61ad9

Browse files
Updated Docker source with new test cases and README (#4481)
Co-authored-by: Shahzad Haider <76992801+shahzadhaider1@users.noreply.github.com>
1 parent ad6fc8f commit ec61ad9

File tree

3 files changed

+429
-36
lines changed

3 files changed

+429
-36
lines changed

pkg/sources/docker/README.md

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# Docker Source
2+
3+
## Overview
4+
5+
The Docker source enables TruffleHog to scan Docker images for secrets, credentials, and sensitive data. It supports scanning images from multiple sources including Docker registries, local Docker daemon, and tarball files.
6+
7+
## Docker Fundamentals
8+
9+
### What is Docker?
10+
11+
Docker is a containerization platform that packages applications and their dependencies into isolated containers. A Docker image is a read-only template used to create containers, consisting of multiple layers stacked on top of each other.
12+
13+
### Key Docker Terminology
14+
15+
| Term | Description |
16+
|------|-------------|
17+
| **Image** | A read-only template containing application code, runtime, libraries, and dependencies |
18+
| **Layer** | Each modification to an image creates a new layer. Layers are stacked and merged to form the final image |
19+
| **Tag** | A label applied to an image (e.g., `latest`, `v1.0.0`) for version identification |
20+
| **Digest** | A SHA256 hash that uniquely identifies an image or layer |
21+
| **Registry** | A repository for storing and distributing Docker images (e.g., Docker Hub, Quay, GHCR) |
22+
| **Daemon** | The Docker service running on the host that manages containers and images |
23+
| **Tarball** | A compressed archive file containing an exported Docker image |
24+
| **History** | Metadata about how an image was built, including commands executed |
25+
26+
## Features
27+
28+
- **Multiple Image Sources**: Scan images from remote registries, local Docker daemon, or tarball files
29+
- **Layer-by-Layer Scanning**: Examines each layer independently for comprehensive coverage
30+
- **History Metadata Scanning**: Analyzes image build history for exposed secrets in commands
31+
- **Concurrent Processing**: Parallel layer scanning for improved performance
32+
- **Authentication Support**: Multiple authentication methods for private registries
33+
- **File Exclusion**: Configure patterns to skip specific files or directories
34+
- **Size Limits**: Automatically skips files exceeding 50MB to optimize performance
35+
36+
## Configuration
37+
38+
### Connection Types
39+
40+
The Docker source supports several image reference formats:
41+
42+
```go
43+
// Remote registry (default)
44+
"nginx:latest"
45+
"myregistry.com/myapp:v1.0.0"
46+
"gcr.io/project/image@sha256:abcd1234..."
47+
48+
// Local Docker daemon
49+
"docker://nginx:latest"
50+
51+
// Tarball file
52+
"file:///path/to/image.tar"
53+
```
54+
### Authentication Methods
55+
56+
#### 1. Unauthenticated (Public Images)
57+
58+
For public images that don't require authentication:
59+
60+
**YAML Configuration:**
61+
```yaml
62+
sources:
63+
- type: docker
64+
name: public-images
65+
docker:
66+
unauthenticated: {}
67+
images:
68+
- nginx:latest
69+
- alpine:3.18
70+
```
71+
72+
**CLI Usage:**
73+
```bash
74+
trufflehog docker --image nginx:latest
75+
```
76+
77+
---
78+
79+
#### 2. Basic Authentication
80+
81+
For private registries requiring username and password:
82+
83+
**YAML Configuration:**
84+
```yaml
85+
sources:
86+
- type: docker
87+
name: private-registry
88+
docker:
89+
basic_auth:
90+
username: myuser
91+
password: mypassword
92+
images:
93+
- myregistry.com/private-image:latest
94+
- myregistry.com/another-image:v1.0.0
95+
```
96+
97+
**CLI Usage:**
98+
99+
Trufflehog does not provide basic authentication using username and password through CLI at the moment.
100+
101+
---
102+
103+
#### 3. Bearer Token
104+
105+
For registries using token-based authentication (e.g., Dockerhub registry):
106+
107+
**YAML Configuration:**
108+
```yaml
109+
sources:
110+
- type: docker
111+
name: truffle-packages
112+
docker:
113+
bearer_token: "ghp_xxxxxxxxxxxxxxxxxxxx"
114+
images:
115+
- myorg/myapp:latest
116+
- myorg/frontend:v2.1.0
117+
```
118+
119+
**CLI Usage:**
120+
```bash
121+
trufflehog docker --image myorg/myapp:latest --bearer-token eyJ_xxxxxxxxxxxxxxxxxxxx
122+
```
123+
124+
---
125+
126+
#### 4. Docker Keychain
127+
128+
Uses credentials from your local Docker configuration (`~/.docker/config.json`):
129+
130+
**YAML Configuration:**
131+
```yaml
132+
sources:
133+
- type: docker
134+
name: local-docker-creds
135+
docker:
136+
docker_keychain: true
137+
images:
138+
- myregistry.com/private-image:latest
139+
- docker.io/myorg/app:latest
140+
```
141+
142+
**CLI Usage:**
143+
```bash
144+
# First, authenticate with Docker
145+
docker login myregistry.com
146+
147+
# Then scan using stored credentials
148+
trufflehog docker --image myregistry.com/private-image:latest
149+
```
150+
151+
**Prerequisites:**
152+
```bash
153+
# Authenticate with your registry first
154+
docker login
155+
docker login ghcr.io
156+
docker login quay.io
157+
158+
# Verify credentials are stored
159+
cat ~/.docker/config.json
160+
```
161+
162+
163+
### File Exclusion
164+
165+
Exclude specific files or directories from scanning using glob patterns:
166+
167+
```bash
168+
trufflehog docker --image myregistry.com/private-image:latest --exclude-paths **/*.log
169+
```
170+
171+
## How Image Scanning Works
172+
173+
### Scanning Process
174+
175+
1. **Image Retrieval**: Fetches the image from the specified source (registry, daemon, or file)
176+
2. **History Scanning**: Extracts and scans image configuration history for secrets in build commands
177+
3. **Layer Processing**: Iterates through each layer in parallel
178+
4. **File Extraction**: Decompresses and extracts files from each layer
179+
5. **Content Scanning**: Analyzes file contents for secrets and credentials
180+
6. **Chunk Generation**: Emits chunks of data to the detection engine
181+
182+
### What Gets Scanned
183+
184+
- **Layer Contents**: All files within each image layer
185+
- **Build History**: Commands used to build the image (FROM, RUN, ENV, etc.)
186+
- **Configuration**: Environment variables and labels
187+
- **Metadata**: Image annotations and custom metadata
188+
189+
### What Doesn't Get Scanned
190+
191+
- Files larger than 50MB (configurable limit)
192+
- Files matching exclude patterns
193+
- Empty layers (no content changes)
194+
195+
## Usage Examples
196+
197+
### Scanning a Public Image
198+
199+
```bash
200+
trufflehog docker --image nginx:latest
201+
```
202+
203+
### Scanning Multiple Images
204+
205+
```bash
206+
trufflehog docker --image nginx:latest --image postgres:13 --image redis:alpine
207+
```
208+
209+
### Scanning from Local Docker Daemon
210+
211+
```bash
212+
trufflehog docker --image docker://myapp:local
213+
```
214+
215+
### Scanning a Tarball
216+
217+
```bash
218+
# First, save an image to a tarball
219+
docker save myapp:latest -o myapp.tar
220+
221+
# Then scan it
222+
trufflehog docker --image file:///path/to/myapp.tar
223+
```
224+
225+
### Scanning Private Registry with Authentication
226+
227+
```bash
228+
docker login my-registry.io
229+
trufflehog docker --image my-registry.io/private-app:v1.0.0
230+
```
231+
232+
## Testing Results
233+
234+
### Integration Test Results
235+
236+
| Test Case | Status | Command/Configuration | Registry URL | Notes |
237+
|-----------|--------|----------------------|--------------|-------|
238+
| Scan remote image on DockerHub | ✅ Success | `--image <image_name>` | https://hub.docker.com/ | Public images work without authentication |
239+
| Scan specific tag of image on DockerHub | ✅ Success | `--image <image_name>:<tag_name>` | https://hub.docker.com/ | Tag specification working correctly |
240+
| Scan remote image on Quay.io | ✅ Success | `--image quay.io/prometheus/prometheus` | https://quay.io/search | Public Quay.io registry supported |
241+
| Scan multiple images | ✅ Success | `--image <image_name> --image <image_name>` | Multiple registries | Sequential scanning of multiple images |
242+
| Scan remote image on DockerHub with token | ✅ Success | Generate token using username and password | https://hub.docker.com/ | Basic auth with PAT working |
243+
| Scan private image on Quay | ⏸️ Halted | N/A | https://quay.io/ | RedHat requires paid account for private repos |
244+
| Scan private image on GHCR | ✅ Success | `--image ghcr.io/<image_name>` | https://github.com/packages | GitHub Container Registry |
245+
246+
## Troubleshooting
247+
248+
### Common Issues
249+
250+
**Issue**: Authentication failures with private registries
251+
252+
**Solution**: Ensure credentials are correct and have pull permissions. Use `docker login` first when using Docker Keychain method.
253+
254+
---
255+
256+
**Issue**: Out of memory errors with large images
257+
258+
**Solution**: Reduce concurrency or scan smaller images. Consider increasing available memory.
259+
260+
---
261+
262+
**Issue**: Slow scanning performance
263+
264+
**Solution**: Enable concurrent processing, use local daemon instead of remote registry, or exclude unnecessary directories.
265+
266+
---
267+
268+
**Issue**: Files not being scanned
269+
270+
**Solution**: Check exclude patterns and file size limits. Verify files are under 50MB.

0 commit comments

Comments
 (0)