PDFLib: Textflows

In our previous post we introduced the PDFLib toolkit and went over some basic functions.  We created a single page document and placed a “Hello World!” line on it.  This time we are going to start with that base set of code and expand it to be able to fit whole paragraphs on the page and format them how we see fit. 

The first concept to introduce is textflows.  A textflow is a set of text with some formatting that is held in memory in the PDFLib object.  It is represented by an integer that is returned by the create_textflow method.  Textflows are used to place arbitrary amounts of text within a space. To demonstrate all of this, copy the code from the last post into a new solution. Then make the following edits: 

Add the textenx, testendy, and tf integer variable declarations to the beginning of the program. 

Between where we declare the optlist and end the page in the old code, remove the line that places the “Hello World!” text line and replace it with the code below.

Below is the output of the program if you run it and we will go through what each part of the code contributed to this output.

The text that I am using is the lorem ipsum text which you can copy below: 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 

This text is a standard used to check formatting, you are free to use any set of text.  Just insure that it will cross three lines or so on a standard letter sized page. 

Now, on to the output pdf file.  The first thing we see is that the first line of text is running off of the page.  This line of text comes from pdf.fit_textline(text, 35, 757, optlist).  With this call we are telling PDFLib to place a line of text 35 pixels from the left edge and 35 pixels from the top edge (792 pixels for letter size) of the page.  Note that the “start” of a textline in PDFLib is the bottom left corner of the first letter.  The optlist is declared above this line and is how we are setting the font and size of the text.  The “text” variable is the lorem ipsum text from above.  However, we see in the output that since we are only using a textline that the text is far too long and runs right off the edge of the page. 

To handle this we need to create a textflow.  We do this with the following line: tf = pdf.create_textflow(text, optlist).  The tf variable is now an integer representing the internal reference the PDFLib object has associated with that textflow.  Now that we have the textflow we need to place it.  To do this, PDFLib requests 4 variables: llx, lly, urx, and ury. These variable correspond to lower left x, lower left y, upper right x, and upper right y respectively.  We these 4 coordinates we can define the rectangle that the text should fit in.  

In the example code we call pdf.fit_textflow(tf, 35, 35, 577, 742, "fitmethod=auto verticalalign=top").  The tf variable is the reference to the above textflow.  The first two 35s are llx and lly which is the lower left margin of the page (which we have arbitrarily decided to be 35 pixels).  577 is 612 (the width of a letter size page) minus 35 (the margin size).  742 is 757 (the lower bound of the first textline) minus 15 (arbitrary amount of space between paragraphs).  The final parameter is the optlist for the fit_textflow method, of which there are many options.  For now what is important to understand is that “fitmethod=auto” is telling the method to automatically try to fit the text in the box.  The “verticalalign=top” option is telling the method to align the text with the top of the box.  

We see the result of all of this in the output file.  What we get in this block of text is what you would expect from most word processing programs; a paragraph with margins and lines breaking at the space between words. 

Now we must get in to the most important aspect of PDFLib when you have multiple items that need to be placed on a page and you don’t know what size they might be: querying the page.  When you place items on a page PDFLib provides some useful tools to determine what the final output of that placement would be.  Since we are not using a graphical interface this is vital for determining where to place subsequent items on a page in order to get the desired layout.  For textflows, we do this with the info_textflow method. 

textendx = (int)pdf.info_textflow(tf, "textendx");

textendy = (int)pdf.info_textflow(tf, "textendy"); 

What we are doing with this is determining the x and y coordinates of the bottom right of the last character placed in the textflow.  Using this information we can place a text line that we can be sure does not overlap with the paragraph.  To demonstrate this, we call the final text placing method from the above code. 

pdf.fit_textline("This is 15 pixels below the preceding text", 35, textendy-15, optlist); 

Here we are placing a text line with the y location being 15 pixels below the end of the paragraph.  We can see from the output file that this line is indeed below the paragraph.  However if you look closely you can see that despite the fact that we tried to put 15 pixels between each set of text, the first break is larger than the second.  This brings up an important discussion about fitboxes and what is actually being referenced by the x and y coordinates.

The fit_textline method uses the y coordinate to denote the bottom of the text line.  The fit_textflow method uses ury to denote the top of the fitbox for the whole paragraph.  This is how we get a 15 pixel space between the first line we placed and the textflow below.  

When we use the info_textflow method to determine textendy we are getting the bottom of the last character placed.  When we subtract 15 from that coordinate we are 15 pixels below it.  However since we then call a fit_textline we place the bottom of the textline at that coordinate.  Since we are using 10 point font this leaves us with only 5 pixels between the final line and the paragraph above it.  This is the reason the spacing is different despite the fact that in code it may seem like we are using the same 15 pixel value.  

The point of all of this is that we must be careful to understand the returns of info methods and the references of the fitboxes so that we place items correctly on the page.  More complex forms tend to be very iterative in their design process so the better we understand the unique nature of the PDFLib coordinate system the fewer iterations we will need. 

To finish off, let’s take a look at some of the options available to us for fit_textflow. Remove the fit_textline and info_textflow lines from the previous code.  Then change the ury option of the fit_texflow method from 742 to 757. Then copy the textflow creation and placement lines and paste them.  Finally, change the verticalalign option for the second fit_textflow to equal “bottom”.  Your code should look like this:

You may be asking why the second create_textflow method is called.  The reason is that a textflow may only be placed once.  If we were to call fit_textflow again on the same tf variable we would get an error. If you place a breakpoint on the second create_textflow you will notice that before executing this line tf is equal to 0.  After executing this line it is equal to 1.  This is the PDFLib object’s reference to the textflow it has created internally. 

After you run the program what you see on the output will look like this:

Followed by a large amount of blank space and then at the bottom of the page, this:

What we have done here is declared a fitbox that is equivalent to a letter sized page with 35 pixel margins.  When we place the first set of text we are telling PDFLib to align everything to the top of this fitbox.  The second set of text does just the opposite, aligning all of the text to the bottom of the fitbox.  

Now remove the second set of textflow creation and placing and change the lly value of the first fit to 757.  Your code should look like this:

This leaves us with only 20 pixels to fit this whole set of text into.  What we get is the following:

Notice that the text is much smaller and there are only 3 lines.  This is because there is not enough space to fit all of that text at the font size we specified.  However, since we are using the fitmethod=auto option, PDFLib automatically lowers the size of the font until it will fit.  If we were using fitmethod=clip we would get the following:

Notice that we have lost the last line of text. Since we could not fit all of the text in the fitbox and we used the fitmethod=clip option PDFLib stops placing text once it runs out of space. 

If you would like the format of the text within the box to be different then you need to edit the optlist of the create_textflow method.  For instance we can center justify the text by adding an alignment=center option to the optlist. 

string optlist = String.Format("alignment=center fontsize=10 font={0}", font) 

This gives us an output that looks like this:

There are a multitude of combinations of text style and placement style.  PDFLib empowers you to place arbitrary sets of text in almost any way you can imagine.  Next time we will discuss images and tables and all of the different ways to add them to your PDFs.  Once there you will be close to being able to make any sort of PDF you require.