Can OCR extract text from handwritten documents?

OCR works best with printed text. Handwritten text — especially cursive — is unreliable. Neat block handwriting may work partially, but do not expect high accuracy with handwritten content.

Is it safe to use online OCR on confidential documents?

PDFFlare's OCR runs entirely in your browser using Tesseract.js. Your files never leave your device — nothing is uploaded to any server. This makes it safe for sensitive documents like contracts, medical records, and financial statements.

What image resolution works best for OCR?

300 DPI or higher produces the best results. Low-resolution scans (72-150 DPI) cause blurry characters that the OCR engine struggles to recognize. If your scan is low quality, try re-scanning at a higher resolution.

Can I OCR a multi-page scanned PDF?

Yes. PDFFlare processes each page automatically, one by one. The extracted text includes page separators so you can identify which text came from which page.

Why is the OCR output jumbled or out of order?

Complex layouts with multiple columns, tables, or text boxes can confuse the OCR engine. It reads text in the order it detects it, which may not match the visual layout. For best results, use documents with simple single-column layouts.

April 14, 20268 min read

How to Extract Text from a Scanned PDF or Image (OCR Guide)

You have a scanned contract, a photo of a whiteboard, or a screenshot of a recipe — and you need the text from it. You could retype it word by word, or you could let OCR do the work in seconds.

OCR (Optical Character Recognition) is the technology that reads text from images and scanned documents. In this guide, we will explain how OCR works, when you need it, and how to extract text from any scanned PDF or image for free using PDFFlare's free OCR tool.

What Is OCR and How Does It Work?

OCR stands for Optical Character Recognition. It is a technology that analyzes the shapes and patterns in an image to identify letters, numbers, and symbols. Modern OCR engines like Tesseract use machine learning models trained on millions of text samples to recognize characters across hundreds of languages and fonts.

Here is how the process works under the hood:

Image preprocessing: The engine converts the image to grayscale, adjusts contrast, and removes noise to make text stand out from the background.
Text detection: The engine locates regions in the image that contain text — paragraphs, lines, and individual characters.
Character recognition: Each detected character is compared against trained models to determine what letter or symbol it represents.
Post-processing: The engine applies language-specific rules and dictionary lookups to correct common misreadings and improve accuracy.

When Do You Need OCR?

If you work with documents regularly, you probably need OCR more often than you realize. Here are the most common situations:

Extracting Text from Scanned PDFs

A scanned PDF looks like a normal document, but it is actually just a picture of each page. You cannot select text, search for a word, or copy a paragraph. This is the most common use case for OCR — turning scanned documents back into editable, searchable text.

This includes old contracts, archived paperwork, signed agreements sent by email, and any document that was photocopied or scanned.

Converting Screenshots to Text

Need to grab text from an error message, a social media post, a website screenshot, or a chat conversation? Instead of retyping it manually, run OCR on the screenshot and copy the text in seconds. This is especially useful for extracting text from images on iPhone and Android when the built-in text selection does not work.

Digitizing Receipts and Invoices

Freelancers, small business owners, and accountants often need to extract amounts, dates, and vendor names from paper receipts for expense tracking. OCR turns a photo of a receipt into text you can paste into a spreadsheet — saving hours of manual data entry every month.

Extracting Text from Photos of Books or Notes

Students and researchers regularly photograph textbook pages, lecture slides, and handwritten notes. OCR extracts the printed text so you can paste it into your notes app, search for specific terms, or translate the content into another language.

Making Old Documents Searchable

If you have a folder full of scanned PDFs — tax documents, medical records, insurance paperwork — you cannot use Ctrl+F to find anything. Running OCR on these files gives you searchable text, making it dramatically easier to find what you need.

How to Extract Text from a Scanned PDF (Step by Step)

PDFFlare's OCR tool runs entirely in your browser using Tesseract.js. Your files never leave your device — everything is processed locally. Here is how to use it:

Go to PDFFlare's OCR tool — no signup or account needed.
Upload your file: Drag and drop a scanned PDF or image file (JPG, PNG, WebP, BMP, or TIFF). Files up to 50 MB are supported.
Select the document language: Choose the language of the text in your document. English is selected by default, but PDFFlare supports over 100 languages including Spanish, French, German, Arabic, Chinese, Japanese, and Korean.
Click "Extract Text": The OCR engine will process your document page by page. A progress bar shows the current status.
Copy or download the result:Once processing is complete, the extracted text appears in a text box. Click "Copy" to copy it to your clipboard, or "Download .txt" to save it as a text file.

For multi-page scanned PDFs, PDFFlare processes each page separately and combines the output with page markers so you know which text came from which page.

How to Extract Text from an Image

The process is identical for images. Upload a JPG, PNG, WebP, BMP, or TIFF file and PDFFlare runs OCR on it directly. This works for:

Screenshots from any device
Photos of documents, receipts, or business cards
Photos of whiteboards or handwritten notes (printed text only)
Scanned pages saved as image files
Infographics and images with embedded text

For the best results, use clear, high-resolution images. Blurry photos, extreme angles, and very small text reduce OCR accuracy.

Tips for Getting the Best OCR Results

OCR accuracy depends heavily on the quality of your input. Here are practical tips to get the cleanest text possible:

1. Use High-Resolution Scans

Scan at 300 DPI or higher. Low-resolution scans (72-150 DPI) make characters blurry, and the OCR engine struggles to distinguish between similar letters like "l" and "1" or "O" and "0".

2. Ensure Good Contrast

Dark text on a white background produces the best results. Faded documents, colored backgrounds, and light gray text reduce accuracy. If your scan is faded, try adjusting brightness and contrast in an image editor before running OCR.

3. Keep the Page Straight

Skewed or rotated text confuses the OCR engine. If your scanned PDF has rotated pages, use PDFFlare's Rotate PDF tool to fix the orientation before running OCR.

4. Select the Correct Language

The OCR engine loads language-specific models. Selecting the wrong language can cause misreadings — especially for non-Latin scripts like Arabic, Chinese, or Korean. If your document contains multiple languages, run OCR separately for each language section.

5. Clean Up Noise

Coffee stains, wrinkles, stamps, and other marks over text can confuse OCR. If possible, scan a clean copy of the document. For photos, ensure even lighting without shadows.

OCR Limitations: What It Cannot Do

OCR is powerful, but it has real limitations you should be aware of:

Handwriting: Current OCR engines work best with printed text. Handwritten text recognition is unreliable, especially for cursive writing. Block letters in neat handwriting may work, but do not count on it.
Complex layouts: Tables, multi-column text, and heavily formatted documents may produce jumbled output. OCR reads text in the order it detects it, which may not match the visual reading order of complex layouts.
Very small text: Text below 8pt in the original document often gets misread, especially in low-resolution scans.
Decorative fonts: Ornamental, script, and highly stylized fonts reduce accuracy. Standard serif and sans-serif fonts work best.

Is It Safe to Use Online OCR Tools?

Most online OCR tools upload your files to a server for processing. If your document contains sensitive information — contracts, medical records, financial statements — this is a legitimate privacy concern.

PDFFlare is different. Your files never leave your device.The OCR engine (Tesseract.js) runs entirely in your browser. No upload, no server processing, no data collection. This makes it safe for confidential documents that you would not want stored on someone else's server.

OCR vs. Copy-Paste: Why You Cannot Just Select Text

If you have ever tried to select text in a scanned PDF and nothing happened, you have encountered the fundamental problem OCR solves. Regular PDFs store text as vector data — each character has a defined position, font, and encoding. Scanned PDFs store pages as flat images. To the computer, a scanned page is just a picture — the same as a photo of a landscape. There are no text characters to select.

OCR bridges this gap by analyzing the image, finding text-like patterns, and converting them into actual characters you can select, copy, and search.

What to Do After Extracting Text

Once you have the extracted text, here are some common next steps:

Paste into a Word document: Create an editable version of the scanned document. Use PDFFlare's Word to PDF tool to convert it back to PDF when done editing.
Search for keywords: Use Ctrl+F in the text file to find specific names, dates, or amounts in long documents.
Translate the content: Paste extracted text into Google Translate or DeepL for instant translation.
Import into a spreadsheet: Paste receipt or invoice data into Excel or Google Sheets for accounting.
Archive and index: Save the text alongside the original scan for full-text search capability.

Wrapping Up

OCR turns images and scanned documents into usable text — saving you from retyping content that is already right there on the page. Whether you are digitizing old paperwork, extracting data from receipts, or grabbing text from a screenshot, OCR handles it in seconds.

PDFFlare's OCR tool is free, runs in your browser, and supports 100+ languages. No signup, no upload, no privacy concerns. Try it with your next scanned document and see how much time it saves.

Related Tools

PDF to Word — convert the OCR output into an editable Word file
PDF to JPG — extract scanned pages as images
Edit PDF — add searchable text annotations