Home Features Download Tutorial Version History License PDF Mosaic Blog Source Code
PDF Mosaic Library may extract text from PDF documents. PDF Mosaic makes available the text contents of a PDF as Unicode strings. With help of PDF Mosaic you can convert Adobe PDF documents to text files. Our PDF SDK provides access to the text content in PDF files without requiring any Adobe product. Use PDFPage.GetText() method to extract text in plain text format.
This sample shows how to extract plane text from PDF documents using PDF Mosaic library.
C# :
using PDFMosaic; using System.Drawing; using System.IO; using System.Diagnostics; namespace ExtractText { class ExtractText { static void Main() { PDFDocument document = new PDFDocument("..\\..\\residential.pdf"); StreamWriter writer = new StreamWriter("Document text.txt"); for (int i = 0; i < document.Pages.Count; ++i) writer.WriteLine(document.Pages[i].GetText()); writer.Close(); document.Save("ExtractText.pdf", true); Process.Start("Document text.txt"); } } }
Visial Basic.NET :
Imports PDFMosaic Imports System.Drawing Imports System.IO Imports System.Diagnostics Module ExtractText Sub Main() Dim document As New PDFDocument("..\\..\\residential.pdf") Dim writer As New StreamWriter("Document text.txt") For i As Integer = 0 To document.Pages.Count - 1 writer.WriteLine(document.Pages(i).GetText()) Next writer.Close() document.Save("ExtractText.pdf", True) Process.Start("Document text.txt") End Sub End Module
Home Features Download Tutorial Version History License PDF Mosaic Blog Source Code