PDF Text Extractor

A minimal Node.js + Express application that extracts plain text from uploaded PDF files. It uses express-fileupload to handle multipart form data and pdf-parse v2 to process the document on the server. A lightweight frontend is included so you can test the flow directly in the browser.

Features

Upload a PDF from the browser and read the extracted text in a large textarea.
/extract API endpoint that accepts multipart uploads and responds with plain text.
Graceful error handling and human-friendly status messages.

Requirements

Node.js ≥ 18 (the project targets active LTS releases).
npm (ships with Node.js) or another compatible package manager.

Getting Started

git clone https://github.com/Keremunce/nodejs-pdf-extractor.git
cd nodejs-pdf-extractor
npm install
npm start

The server starts on http://localhost:3004. Visit the same address in your browser to open the test page.

Usage

Click PDF file and choose a local .pdf.
Press Extract text.
Wait a moment—status updates appear just above the textarea.
Read or copy the extracted text from the textarea.

API Reference

POST /extract
Content-Type: multipart/form-data
Field name: pdf (File)

Success (200) – Returns raw text with text/plain; charset=utf-8.
400 – No file was included in the request.
500 – Server-side error while parsing the PDF.

Example using curl:

curl -X POST http://localhost:3004/extract \
  -F "pdf=@/path/to/document.pdf"

Project Structure

.
├── index.js          # Express server and /extract endpoint
├── public
│   └── index.html    # Browser UI for manual testing
├── package.json
└── README.md

Contributing

Pull requests and issues are welcome! If you find a bug or would like to improve the UI, documentation, or extraction accuracy, please open an issue first so we can discuss the change.

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
public		public
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Text Extractor

Features

Requirements

Getting Started

Usage

API Reference

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Keremunce/nodejs-pdf-extractor

Folders and files

Latest commit

History

Repository files navigation

PDF Text Extractor

Features

Requirements

Getting Started

Usage

API Reference

Project Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages