PDF Creator – about project and development process

This article describes the organization of the PDF Creator project and the tools that we use to develop and promote the product.

Development Tools

1. Subversion (SVN) Control

The life of a programmer would be unbearable without a source control manager :-). As of this writing, we have executed 2387 commits in the PDF Creator project using the Tortoise SVN, (http://tortoisesvn.tigris.org/). Since every mistake would cost a lot, we try to follow this rule – one that we would recommend to everyone: Always commit smaller portions of the code. Every commit must correspond to a single task!

This way it is (a) easier to find an error during rollbacks (every commit changes a small amount of source code), (b) often easy to discover an error before the commit just by looking at it, and (c) easier for colleagues to be aware of the changes in the source code, since all they have to do is read the comments and look at the smaller portions of the code.

Of course, this is not possible in all cases, and sometimes we do have to perform huge commits. Still this is a rule worth following. After all, it is one of the refactoring rules: Change smaller portions of the code, preserving its workability between sessions.

Two more source-control rules:

  1. There must always be two versions in the SVN – a version that compiles and a workable version. It is easy to create code that compiles, but the real aim is its correctness.
  2. Text comments should be supplied to every commit. This is also a good way to check whether the commit implements only a single task. If a comment describes several targets instead of just one, the rule has been violaled.

2. Bug Tracker

We use Mantis (http://www.mantisbt.org/). PDF Creator is the base for a number of other products – converters, virtual printer, etc. The number of its users is much higher than the count of those who purchased the base product. We needed to simplify communications between the developers and the users of the products.

To make the things clear for those who have never used a bug tracker, here is a sample of how it works. The virtual printer developers receive (from the users of their product) information about an incorrect conversion into a PDF. The Virtual Printer gets an EMF file from input, then our library converts it. The information from the user is used to create a “ticket” for the library developers. The ticket is accompanied by a description of the problem and the problematic EMF file. Then we investigate the degree of severity of the problem to find out how critical it is and appoint a specific developer to work on a fix. After the problem has been resolved, the ticket gets closed. Everybody concerned is notified and may keep an eye on what is going on.

3. Development Environment

We use Visual Studio 2008, C++, and C#.

4. Refactoring Tools

There are no built-in tools for C++ refactoring in Visual Assist X (http://www.wholetomato.com/), so we have to use a third party plug-in for Visual Studio.

5. Unit Tests in C++

Here we use UnitTest++ (https://sourceforge.net/).

 

Project Organization

PDF Creator is a library for PDF processing: creation, reading, modification, text extraction, etc. Currently, the project exists as a solution with 18 sub-projects in Visual Studio 2008. There are five general types of sub-projects.

1. The Library Core

Here we define all the logic of PDF processing. This project is always in the process of modification. One of our old dreams (and still a dream today) is to make the core cross-platform. The biggest obstacle is the fact that EMF is a private Windows format. To make our dream come true, we should separate the conversion from the cross-platform part.

2. Client Interfaces

Two of the projects implement the client interfaces. One of them implements the COM interface; the other one implements a static library. The COM interface delegates its tasks to the static library; the static library forwards them to the core. No code is duplicated.

The static library isn’t advertised on our site, however every client may receive it together with the COM version. The static library contains some undocumented features that allow users to get some additional information about a document. If you are a developer and your product is intended to view PDFs, these features may be very useful for you.

3. Test Projects

  1. Non-automated tests for the COM interface. A set of APIs that form and render PDF documents immediately after their launch.
  2. Unit tests for the static library and the core. In addition to those described immediately above, the unit tests check the internal state of the classes, handle errors, etc.
  3. Projects that check the EMF-conversion. The first one (written in WTL) converts the specified EMF file into a PDF. Its interafce is minimalistic.
    The second one (written in C#) are a work in progress. However, it is already possible to convert all PDFs from a folder and to instantly review the results. This provides a significant speed increase during the testing process, which is especially great since we’ve got hundreds of test metafiles already.

4. Auxiliary Projects

These include a font-processing library (that parses TrueType and Type1 fonts, works with encodings, and modifies the TrueType fonts), font installer, working with the keys, etc.

5. Third Party Open-Source Products

PDF is a complex format. It is based on many areas of science, multiple algorithms, and formats. It involves compression and encryption algorithms, various image and font formats, color spaces, etc. A big part of our implementation is based on open source projects with a free license, for instance, CxImage, ZLib, LibJPEG, Lcms, and UnitTest++.

What you should know before deploying your .NET application that uses PDF Creator Pilot and HTML2PDF Addon to another server.

First, you need to install PDF Creator Pilot and HTML2PDF Addon to this server.

It is not necessary to run installer programs for those products, you may register in system corresponding dll files running commands (both on x86 and x64 versions):

C:\Windows\system32\regsvr32 PDFCreatorPilot.dll
C:\Windows\system32\regsvr32 HTML2PDF.dll

Second, you need to re-create interop wrappers for these components. This can be done with standard .NET SDK utility – TlbImp.exe (C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\TlbImp.exe).

Example:

TlbImp.exe PDFCreatorPilot.dll /out:Interop.PDFCreatorPilotLib.dll
TlbImp.exe HTML2PDF.DLL /out:Interop.HTML2PDFAddOn.dll

Note: It is not necessary to regenerate wrappers directly on the server, you may do it on the developer machine and copy theese wrappers to the server.

Third, you need to place these interop wrappers into the appropriate directory on the new server. For ASP.NET it is the “bin” folder of that application. For the rest applications you need to place wrappers near the exe module (at the same folder).

Filimonov Maxim

COM arrays in PHP

We don’t claim to be experts in PHP, and an educated reader might find this article to be a redundant description of evident facts. Nevertheless, we hope that this article may be of some help to somebody.

We wanted to check whether our library (this one) works well with PHP. Continue reading

Using PDF Creator Pilot on ASP.NET Web Pages without Visual Studio

To do work with ASP.NET, we must perform three steps:

  1. Create an Interop-wrapper
  2. Copy the wrapper into a specific folder
  3. Attach the namespace libraries to the application

To create the Interop-wrapper of PDF Creator Pilot (i.e. a wrapper that would make it possible to call unmanaged COM-object code of the library from the managed code of an ASP.NET application), we should use one of the standard utilities from the .NET SDK – TlbImp.exe (C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\TlbImp.exe)

Example:

TlbImp.exe PDFCreatorPilot3.dll /out:Interop.PDFCreatorPilot3Lib.dll

Then we need to copy the wrapper into the “bin” subfolder of the web-application root folder. (If that folder does not yet exist, we will have to create it.)

Example:

If our web-application is located in “C:\Inetpub\wwwroot\MyApp”, then we should put the wrapper into “C:\Inetpub\wwwroot\MyApp\bin”.

To attach the web-application to the namespace library, we should append the following line to the “.aspx”-file:

<%@ Import Namespace="Interop.PDFCreatorPilot3Lib.dll" %>

After that, a COM-object of PDF Creator Pilot may be used from ASP.NET.

Example:

<%@ Import Namespace="System" %>
<!-- other import directives are here -->
<%@ Import Namespace="Interop.PDFCreatorPilot3Lib.dll" %>
<HTML> <HEAD>
  <TITLE>Test</TITLE>
  <SCRIPT language="C#" runat="server">
    void ButtonPerform_Click(object sender, System.EventArgs e)
    {
      PDFDocument3Class pdf = new PDFDocument3Class();
      pdf.StartEngine("demo@demo", "demo");
      pdf.AutoCreateURL = true;
      // set other options if needed
      pdf.BeginDoc();
      // do something
      pdf.EndDoc();
    }
  </SCRIPT>
</HEAD>
<BODY>
  <!-- here page content goes -->
  <FORM runat="server">
    <INPUT type="button" id="ButtonPerform" value="Click Me"
      OnServerClick="ButtonPerform_Click" runat="server" />
    <!-- or another vaiant -->
    <asp:Button id="ButtonPerform1" Text="Click Me"
      OnClick="ButtonPerform_Click" runat="server" />
  </FORM>
</BODY>
</HTML>

Maxim Filimonov

C++ Unit Tests

Hi everybody!

Here I talk about various tools that may be used to develop the unit tests with C++, in particular about those that we use in PDF Creator, and about the way we deploy it.

As PDF Creator evolved, it started to become more evident that there was a need for an easy-to-use testing procedure. The huge code base, lots of clients, quirks of the PDF format and of the library itself — these were the factors that turned every change in the code into a difficult task. Currently, every time we release a new version (actually, even more frequently), we pass the library through a set of visual tests. Some tests produce PDF files; their quality and correctness may be checked visually. Other tests convert EMF files, producing a visible result, too. There are also ASP and VB scripts among the tests. The set of the tests grows constantly, however, even when the library passes through all of them OK, there is no guarantee that all the functions of the library have been retained.

Because of that, while developing PDF Creator 3.9, we paid a lot of attention to the refactoring of our code. One of our refactoring goals was to improve testability of the code. After we determined the goals, we had to choose an environment for creation of the test units. Initially, it was a choice between the classic CppUnit and the Boost.Test Library.

Here is a summary of the requirements that we imposed on the testing environment:

  1. Writing a test should require minimal preparation.
  2. Test results should be clearly revealed and easily observed. Integration into Visual Studio is a plus.
  3. The environment must be cross platform. (In the future, we plan to make the PDF library platform independent.)
  4. The size of the environment must be small.
  5. Test units must be automatically excluded from the Release build.

I’ve had some experience with the CppUnit before, and I was not pleased. It failed to pass requirements 1, 2, and 4. It contains a lot of redundant stuff we did not actually need. The interface of CppUnit is not trivial, and we would have to add a lot of code ourselves. Negative. Especially after nUnit.

We were going to use Boost.Test, however, we discovered that we would have to include the entire Boost Library, with all its bells and whistles. I am not a big expert in Boost. If there is a way to use Boost.Test without adding the entire library to the project, please tell us how. We had to reject Boost.

Then we had to look for an alternative. I think that http://www.gamesfromwithin.com/articles/0412/000061.html is a must read. It is an excellent comparison of various test unit tools.

After reading the article, of course, we looked into CxxTest (http://cxxtest.sourceforge.net/), and it disappointed us. Documentation is huge, but not very understandable. The latest revision was outdated – from 2004! Compilation requires Perl ! Another wrong environment.

We looked into some other variants, and none of them seemed to fit. With our hope of finding anything slowly dying already, we found UnitTest++. (Applause!) You can see it here: http://unittest-cpp.sourceforge.net/. One of the UnitTest++ developers, Noel Llopis, is also the author of the excellent article referenced above. Here is his description of his own product: http://www.gamesfromwithin.com/articles/0603/000108.html

UnitTest++ fits all 5 of our requirements. It’s trivial to write a test. The environment is easy to understand and cross-platform. Whenever a test fails, it is detached into a separate project. The toolset seamlessly integrates into Visual Studio. When the tests are run, the output window displays the number of tests, elapsed time, and reviews of the failed tests. It was like a dream come true.

We started to use UnitTest++ in PDF Creator. Soon we discovered that we did not receive its excellent features “for free”. Let’s look under the hood of a typical test:

TEST(SomeTest)
{
const int expected = 123;
int res = testedFunction();
CHECK_EQUAL(res, expected);
}

During compilation, the TEST macro turns into a class named TestSomeTest, that inherits from an unknown class UnitTest::Test. The last line of the code after that looks like this:

void TestSomeTest::RunImpl(UnitTest::TestResults& testResults_) const

As you can see, the body of the test is created by RunImpl. After we looked inside the CHECK_EQUAL macro, we understood that it may render correct results only under some limited circumstances — only inside methods, defined with the Test macros. That meant that we had to forget about normal refactoring. In particular, it is impossible to separate a test method.

It is also impossible to temporarily disable a method, since it is a macro. To switch a test off, one has to comment it.

Another problem: IntelliSense doesn’t always work properly when a test method is being written. Evil consequences of the use of macros, again. It is also hard to extend and improve the functionalities of UnitTest++. Perhaps that’s the reason why it has not been updated for quite a long time – since April, 2007.

However, it is not a bad environment. Refactoring problems may be by-passed. The other problems may be put up with. In my next article, I plan to provide more details about testing with UnitTest++ and the process of hunting for memory leaks with the help of memleaks.

Vitaly Shibaev

About PDF Creator Pilot 3.9

Hi, everybody! My name is Vitaly Shibaev. I am one of the developers of the PDF Creator library. In my first note, I want to talk specifically about version 3.9.

In my opinion, the most important improvement in this version is how it renders texts. Any kind of text renders correctly, as searchable and clipboard compatible. In general, the library generates smaller output files, especially when the CJK fonts are used.

We also got rid of memory leaks, and along the way we have seriously refactored the code. As a result, the library works faster and is more stable.

Instead of using BoundsChecker to catch the leaks, we used great two code files that actually saved us. Just include them in your project, by adding “#include “mmgr.h”” in a header file, and you’re all set. After that, in the same location as the application executable, three log files will be created (memleaks.log, memory.log and memreport.log). The “memleaks.log” file will report any and all leaks. When you turn on the “memreport.log” in “mmgr.cpp” (by default it’s turned off), it provides detailed reports about all memory allocations and deallocations (when, how much, etc.). I would recommend mmgr to all C++ developers.

There are some more important improvements in the library. It now renders CMYK JPEG images. This feature is vital for users with special image quality requirements, especially in pre-press.

Other improvements are related to interactive forms and annotations. First, file sizes have been optimized. In most cases, their size will be 2-3 times smaller than previously. Second, we’ve improved the rendering of Unicode symbols.

We’ve added the Caption property to the radio button descriptions. Caption is now a part of the control. It’s coded like this:

PDF.PDFPAGE_CreateControl_RadioButton “rb_group”, 0, 0, 100, 20
PDF.PDFANNOTATION_Caption = “Place some text here”

Revealing some of our secrets, we’ve reorganized and improved our code in version 3.9. We added some unit tests (not for all the code yet). The process of choosing a proper unit test environment for C++ was of special interest. It would be a good topic for the next time.

Vitaly Shibaev

What do you want to use PDF Creator Pilot for?

We try to make PDF Creator Pilot as useful as possible for you. We add new features, fulfill requests, and fix bugs. Now we need your help. Please tell us what you use (or want to use) PDF Creator Pilot for. This information will help us to make PDF Creator Pilot even better for you.

Sergius Bobrovsky