Skip to content

Extract Text From a PDF on Windows Free

Extract text from a PDF free on Windows using WSL and pdftotext. Pull clean text from one PDF or a whole folder, keep the layout, all offline with no uploads.

MGMCSA Guru Team June 21, 2026 3 min read
A WSL terminal extracting text from a PDF into a plain text file with pdftotext on Windows

Copy-pasting text out of a PDF is a coin flip — sometimes it works, sometimes you get gibberish or the page won’t let you select anything. Feeding a confidential report into an online extractor to get clean text is also not great when the document isn’t meant to leave your hands.

WSL pulls the text out locally, for free, with pdftotext from the Poppler project. One PDF or a whole folder, layout preserved if you want it, nothing uploaded.

No WSL yet? See the WSL install guide.

Install the tool

pdftotext is in poppler-utils:

sudo apt update && sudo apt install -y poppler-utils

Confirm:

pdftotext -v

Extract text from a PDF

Give it the PDF and an output filename:

pdftotext input.pdf output.txt

Leave off the output name and it writes input.txt next to the PDF. To see the result straight away in the terminal, send it to standard output with -:

pdftotext input.pdf -

That prints the whole document’s text to the screen — handy for a quick look or piping into a search.

Fix jumbled, multi-column text

Reports, papers, and newsletters with columns often come out interleaved because the default reading order guesses wrong. The -layout option keeps the visual arrangement:

pdftotext -layout input.pdf output.txt

This usually straightens out columns and tables. If the plain extraction looks scrambled, reach for -layout first.

Extract only certain pages

Use -f (first) and -l (last) to limit the range:

pdftotext -f 2 -l 5 input.pdf output.txt

That pulls text from pages 2 through 5 only. Setting both to the same number grabs a single page.

pdftotext options

pdftotext in.pdf out.txt Extract all text to a file
pdftotext in.pdf - Print text to the terminal
pdftotext -layout in.pdf out.txt Preserve columns and layout
pdftotext -f 2 -l 5 in.pdf out.txt Only pages 2 to 5
pdftotext -enc UTF-8 in.pdf out.txt Force UTF-8 output encoding

Batch a whole folder

Turn every PDF in a folder into a matching text file:

for f in *.pdf; do pdftotext -layout "$f" "${f%.pdf}.txt"; done

report.pdf becomes report.txt, and so on, with the PDFs left in place.

The one case this can’t handle: scanned PDFs

Wrapping up

Extracting text from a PDF on Windows is one command: pdftotext input.pdf output.txt, with -layout when columns get scrambled and -f/-l to limit the pages. A short loop clears a whole folder. The only thing it can’t do is read scanned pages — those need OCR first.

It’s free and runs in WSL, so even sensitive documents stay on your machine. While you’re working with PDFs, the same Poppler package powers converting PDF pages to images.

Frequently asked questions

How do I get the text out of a PDF?

pdftotext reads a PDF and writes its text to a plain .txt file. Give it the input PDF and an output name, and it pulls the text in reading order. The original PDF is left unchanged.

Why does my extracted text come out jumbled?

Complex layouts like multi-column pages can scramble reading order. The -layout option tells pdftotext to preserve the visual arrangement, which usually fixes columns and tables. Try it if the default output is out of order.

Can I extract text from a scanned PDF?

Not with pdftotext alone. A scanned PDF is just images of pages, so there's no text layer to pull. You'd need OCR (optical character recognition) first, with a tool like Tesseract, to turn the images into selectable text.

Is my PDF uploaded anywhere when extracting text?

No. pdftotext runs locally in WSL, so the file stays on your machine. That matters for contracts, reports, or anything confidential you want to pull text from without sending it to a website.

How do I extract text from many PDFs at once?

Use a short loop that runs pdftotext on each PDF and writes a matching .txt file. The loop in this guide does that, so a folder of PDFs becomes a folder of text files in one pass.

Sources & further reading

Official vendor documentation referenced while writing this guide.

MG

MCSA Guru Team

IT & Systems Administration

We are working IT pros and system administrators who spend our days in Windows Server, Microsoft 365, and the wider Microsoft stack. MCSA Guru is where we write down the fixes and walkthroughs we wish we had found the first time.

MCSA Guru provides independent, educational IT guidance. Microsoft, Windows, Windows Server, Microsoft 365, Exchange, and Microsoft Teams are trademarks of Microsoft Corporation; Docker is a trademark of Docker, Inc. MCSA Guru is not affiliated with or endorsed by Microsoft or Docker. Always test changes in a safe environment before applying them in production.

Related guides

Fixing something right now?

Jump straight into the guide library or search for the exact error or task you are dealing with.