PDF Mosaic: How to extract text from PDF documents

Home       Features       Download       Tutorial       Version History       License       Source Code


PDF Mosaic Library may extract text from PDF documents. PDF Mosaic makes available the text contents of a PDF as Unicode strings. With help of PDF Mosaic you can convert Adobe PDF documents to text files. Our PDF SDK provides access to the text content in PDF files without requiring any Adobe product. Use PDFPage.GetText() method to extract text in plain text format.

This sample shows how to extract plane text from PDF documents using PDF Mosaic library.

C# :

using PDFMosaic;
using System.Drawing;
using System.IO;
using System.Diagnostics;
namespace ExtractText
  class ExtractText
    static void Main()
      PDFDocument document = new PDFDocument("..\\..\\residential.pdf");
      StreamWriter writer = new StreamWriter("Document text.txt");
      for (int i = 0; i < document.Pages.Count; ++i)
      document.Save("ExtractText.pdf", true);
      Process.Start("Document text.txt");


Visial Basic.NET :

Imports PDFMosaic
Imports System.Drawing
Imports System.IO
Imports System.Diagnostics
Module ExtractText
  Sub Main()
    Dim document As New PDFDocument("..\\..\\residential.pdf")
    Dim writer As New StreamWriter("Document text.txt")
    For i As Integer = 0 To document.Pages.Count - 1
    document.Save("ExtractText.pdf", True)
    Process.Start("Document text.txt")
  End Sub
End Module


Home       Features       Download       Tutorial       Version History       License       Source Code