PDF Mosaic: How to extract text from PDF documents


Home       Features       Download       Tutorial       Version History       License       PDF Mosaic Blog       Source Code

 

PDF Mosaic Library may extract text from PDF documents. PDF Mosaic makes available the text contents of a PDF as Unicode strings. With help of PDF Mosaic you can convert Adobe PDF documents to text files. Our PDF SDK provides access to the text content in PDF files without requiring any Adobe product. Use PDFPage.GetText() method to extract text in plain text format.

This sample shows how to extract plane text from PDF documents using PDF Mosaic library.

C# :

using PDFMosaic;
using System.Drawing;
using System.IO;
using System.Diagnostics;
 
namespace ExtractText
{
  class ExtractText
  {
    static void Main()
    {
      PDFDocument document = new PDFDocument("..\\..\\residential.pdf");
 
      StreamWriter writer = new StreamWriter("Document text.txt");
      for (int i = 0; i < document.Pages.Count; ++i)
        writer.WriteLine(document.Pages[i].GetText());
 
      writer.Close();
 
      document.Save("ExtractText.pdf", true);
      Process.Start("Document text.txt");
    }
  }
}

 

Visial Basic.NET :

Imports PDFMosaic
Imports System.Drawing
Imports System.IO
Imports System.Diagnostics
 
Module ExtractText
  Sub Main()
    Dim document As New PDFDocument("..\\..\\residential.pdf")
 
    Dim writer As New StreamWriter("Document text.txt")
    For i As Integer = 0 To document.Pages.Count - 1
      writer.WriteLine(document.Pages(i).GetText())
    Next
 
    writer.Close()
 
    document.Save("ExtractText.pdf", True)
    Process.Start("Document text.txt")
  End Sub
End Module

 


Home       Features       Download       Tutorial       Version History       License       PDF Mosaic Blog       Source Code