A.nnotate server installation guide
Part 2: optional modules

This chapter describes how to install the optional modules for A.nnotate, and should be read after completing the basic install (with PDF and HTML support). Note that installing these modules is more complex than installing the basic A.nnotate server. You can install all, none or just a subset of these modules, depending on your requirements.

1. Enabling the apache user to run programs

Typically PHP scripts are run as the apache user - which is a restricted account with no home directory. On Ubuntu linux, the default apache user is set up as www-data; one way to find our which user apache runs as on your system is to type ps aux | grep httpd and see who owns the cluster of 'httpd' processes.

If you want to be able to run openoffice or firefox as the apache user, (to support uploading Word documents and generating thumbnails of web pages) you will need to create a home directory where these applications store their profile settings.

   # As root: first check if the apache user
   # already has a home directory that it can write to:
   #   N.B. if apache runs as a different user
   #   e.g. 'www-data' you should replace 'apache' with 'www-data'
   #   throughout this guide, e.g.
   #     su www-data
   #   
   % su apache
   $ cd
   $ touch tmp.tmp

   # If this works, and the apache home directory is writable by
   # the apache user, you can skip the rest of this section.
   # If this fails, you need to set up a home directory
   # for the apache user - the example below sets it
   # to '/var/www/ahome':-

   % cd /var/www
   % mkdir ahome
   % chown apache ahome
   % chgrp apache ahome

   # Enable home directory for apache:
   % vi /etc/passwd

   # edit the entry for the apache user to allow logins
   # and set the home dir, e.g.:
   apache:x:48:48:Apache:/var/www/ahome:/bin/bash

   # Check you can su to the apache user now:
   % su apache

At this point you should also change the settings in the configuration file in annotate/scripts/bashconfig.inc, which is included by the various scripts for running openoffice and firefox:

  % su annotate
  $ cd /var/www/html/annotate/scripts
  $ vi bashconfig.inc

  # as the annotate user, edit the settings in annotate/scripts/bashconfig.inc:

  APACHE_USER=apache
  APACHE_HOME=/var/www/ahome

2. Adding OpenOffice support

Install openoffice 3.x.x or later : www.openoffice.org. It will install itself to somewhere like /opt/openoffice.org3/program.

Many linux distributions only offer older versions of openoffice via their standard install mechanism, so it is worth downloading directly from the openoffice download page. To fetch the openoffice 3.x.x. binary directly onto the server, you can follow the steps below:-

   # Check you can run openoffice as a normal user:
   % su annotate
   $ vi .bashrc
   export PATH=/opt/openoffice.org3/program:$PATH
   $ source .bashrc

   # On your local machine, set xhost+ and check your firewall
   # can accept X connections on the normal TCP port (6000)
   $ export DISPLAY={your ip}:0.0
   $ soffice &

   # on Ubuntu, the executable could be called 'ooffice' not 'soffice'
  

There is a test document in annotate/scripts which you can try, but first you need to check the settings in annotate/scripts/bashconfig.inc:

  # As the annotate user, check the paths in the config file: scripts/bashconfig.inc:
  % su annotate
  $ cd /var/www/html/annotate/scripts          # ... where you installed annotate
  $ vi bashconfig.inc
  OOPATH=/opt/openoffice.org3/program
  OOEXE=/opt/openoffice.org3/program/soffice
  OOEXENAME=soffice.bin
  OOPYTHON=/opt/openoffice.org3/program/python

  # If any of these are not correct, edit the bashconfig.inc file.
  # The OOPYTHON setting points to the version of python which is
  # bundled with the installation of openoffice from openoffice.org.
  # For Ubuntu, you can change this to your standard python install
  #  (e.g. /usr/bin/python).  You will need the python-uno package  
  #  installed for calling openoffice.


  # The following command should convert 'sample.doc' to '/tmp/sample.pdf'
  $ ./ooconv.sh sample.doc /tmp/sample.pdf

You can check whether you can run the conversion as the apache user (complete the steps above 'enabling the apache user to run programs').

  # ... login as root, then switch to the apache user ('www-data' on ubuntu)
  % su apache

  $ cd /var/www/html/annotate/scripts   # ... or your install directory
  $ ./ooconv.sh sample.doc /tmp/sample2.pdf

  # If you get something like ERROR! ErrorCodeIOException 525
  # check that the output PDF file is writable by the current user

Running openoffice in server mode

The test conversion above started openoffice, converted a document, then killed the openoffice process. This can take a few seconds for each document. You can avoid the openoffice startup time by running it in server mode, listening to a socket for incoming documents. This also has the advantage that you can run openoffice as a separate user from the Apache one (e.g. you could create a new user just to run openoffice).

  # As the user you want to run openoffice as:
  # (as root)
  % adduser openoffice

  % su openoffice

  # (as the 'openoffice' user')
  $ cd /var/www/html/annotate/scripts
  $ ./oocron.sh                 # this starts up openoffice

  # Check that the 'soffice.bin' process is running:
  $ ps aux | grep soffice

  # Try converting a test file again a couple of times, running
  # as the apache user:
  # (as root)
  % su apache

  $ ./ooconv.sh sample.doc /tmp/test5.pdf
  $ ./ooconv.sh sample.doc /tmp/test6.pdf

  # All being well, the second time should have been much
  # faster, as you avoid the startup time of openoffice.

  # You need to keep the openoffice process alive all the time
  # e.g. using a cron job, as your chosen openoffice user, adding an
  # entry like:
  # as root...
  % su openoffice
  $ crontab -e
* * * * * bash /var/www/html/annotate/scripts/oocron.sh >/dev/null 2>&1

While openoffice is running in server mode, the conversions from office formats should be much faster.

Updating the php/phpconfig.inc file to enable openoffice support

To enable support for the office formats when you upload a document to your annotate server, edit your php/phpconfig.inc file as follows:

  # Edit the setting in php/phpconfig.inc to point to the ooconv.sh script:
  % su annotate
  $ cd /var/www/html/annotate/php        # ... or your install directory
  $ vi phpconfig.inc
  $ooshcommand="/bin/bash /var/www/html/annotate/scripts/ooconv.sh";

  # Test it out by uploading a short Word / openoffice file on your
  # documents.php page.

Installing Windows Fonts for OpenOffice on Linux

By default, an openoffice installation on Linux will not have access to the standard Windows fonts (Arial, Verdana etc), which can cause problems with the Word to PDF conversion for documents created on a Microsoft operating system. Unlike PDF files, Word files do not include the fonts they depend on, and assume the recipient has the relevant fonts installed. However, it is possible to install the Windows standard fonts on Linux which greatly improves the quality of generated PDFs from Word files.

# Install microsoft truetype fonts on Ubuntu / debian:
  % sudo apt-get install msttcorefonts

This (external) blog entry has details on installing truetype fonts on Linux; another blog entry has notes on using the new MS Vista fonts on Linux. The basic steps for installing TrueType fonts and making them available to applications (including openoffice) are outlined below. On Windows systems, your fonts will be installed to a path like: C:\WINDOWS\Fonts\*.ttf. You will have to restart openoffice after installing fonts.

# Check you have the standard PostScript Type1 fonts installed:
# (e.g. on Fedora:)
  % yum install ghostscript-fonts

# Steps for installing additional TrueType fonts on Linux
  % cd /usr/share/fonts/truetype
  % mkdir myfonts
  % cd myfonts
# ... copy the *.ttf files to myfonts/
  % mkfontdir
  % fc-cache

3. File upload progress meter

To display a progress bar during upload, you need to install a Perl script into the cgi-bin directory.


  # If you haven't yet set up your server to run cgi-bin scripts yet:
  #   As root, check the Apache configuration file
  #   (e.g. in /etc/httpd/conf or /etc/apache2/apache2.conf /mnt/install/apache/conf/httpd.conf)
  #   If you haven't set up your apache for cgi-bin, check
  #   that the mod_alias module is installed.
  %   vi /etc/httpd/conf
  #   The cgi-bin setting will be in a line like:-
  ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

  # You can put the Annotate perl scripts in a subdirectory:
  # as root:
  % mkdir /var/www/cgi-bin/annotate
  % chmod a+rx annotate
  % cd /var/www/cgi-bin/annotate
  % cp /var/www/html/annotate/cgi-bin/* .

  # The settings are in a perl file 'cgi-bin/fheader.inc' - edit this
  # to make sure the paths are set correctly. The important
  # setting here is the temporary directory to use for uploads,
  # as it must agree with the setting in php/phpconfig.inc
  # You can leave it as the default (/tmp/annotate), or change
  # it in both places.

  % vi fheader.inc
  $tmp_dir="/tmp/annotate";

  % chmod a+x *

  # Test running the perl script from the command line.
  # If you get any error messages here, the script won't
  # run from the cgi-bin directory either.

  % ./fheader.inc
  # ... this should do nothing.

  # Try visiting:  http://your.server/cgi-bin/annotate/printenv.pl from browser
  # to check the cgi-bin is working. It should print out a list of environment
  # variables to the browser.

  # Gotchas: if you get 'Internal Server Error' it could be
  # caused by having DOS not Unix return characters at the end of
  # the script lines.  You can fix this with the dos2unix command.
  # Also worth checking the log files (somewhere like /var/log/httpd)
  # Some perl / cgi-bin installations have security settings which 
  # will only run cgi-bin programs if they have the same owner/group
  # as the cgi-bin user, so you may need to set the owner of the cgi-bin
  # perl scripts if this is the case.

  # Edit the php/phpconfig.inc file to switch the file upload progress bar on:
  % su annotate
  $ cd /var/www/html/annotate/php
  $ vi phpconfig.inc

  $uploadtmpdir = "/tmp/annotate";

  $fileuploadprogress = true;
  $fileuploadcgibin = "/cgi-bin/annotate/";


  # Try uploading a pdf file by browsing to your documents.php page;
  # you should now see a blue progress bar during the upload.

4. Generating thumbnails of websites

Thumbnails of textensor.com site

To generate thumbnails of websites (displayed on your index pages next to the list of notes), you need to install a web browser (firefox) and a virtual X framebuffer (Xvfb). You will also need the netpbm tools (pnmtopng, pnmscale).

   # As root:
   # (on fedora core 4):-
     % yum install xorg-x11
     % yum install xorg-x11-Xvfb

   # (on fedora 8):-
     % yum install xorg-x11-server-Xorg
     % yum install xorg-x11-server-Xvfb
     % yum install xorg-x11-fonts-*
     % yum install xorg-x11-apps-*
   
   # (on ubuntu)
     % sudo apt-get install xvfb

   % yum install firefox

The thumbnail generation will be run as the apache user on some systems. However, in order to enable the apache user to run firefox, there has to be a home directory created for apache (on many systems, apache runs as a restricted user). (see section 1 above for creating a home directory for apache).

   # Check you can su to the apache user:
   #   (on Ubuntu, this is 'www-data' not 'apache'):
   #    su www-data
   % su apache

   # Try running firefox as the apache user; store
   # the settings in the profile 'test'
   # On your local machine, xhost+
   $ export DISPLAY={your IP}:0.0
   $ firefox -CreateProfile test

   # Run firefox with the display on your X window:
   $ firefox -P test http://www.textensor.com

   # Resize the window to be about 1000 x 1024 pixels.
   # When firefox starts again, it will keep
   # the size, which will be used for the screenshots
   # for the thumbnails.

   # At this point, you also need to switch off the
   # 'Restore session' window which will appear on
   # restarting firefox if it crashes for any reason.
   # 1. Type 'about:config' in the location bar.
   # 2. go to the 'browser.sessionstore.enabled' setting
   # 3. change the setting to false (double-click on the entry)

   # Quit firefox, and the settings will be saved in the profile.

To run firefox on the server, you will need to have the Xvfb frame buffer running all the time. One way to do this is to set up a cron job to check Xvfb and start it if it is not running - this will also automatically restart the X display if the process dies for any reason.

   # As root...
   % su annotate
   # We will run the X framebuffer as the 'annotate' user.

   # Check that you can start Xvfb manually:
   $ cd /var/www/html/annotate/scripts/
   $ ./startxvfb.sh

   # If this works ok, check you can run firefox using the Xvfb display:

   $ export DISPLAY=:1
   $ firefox &

   # You won't see any output, as firefox is displaying to 
   # the Xvfb virtual frame buffer.
   # Kill the firefox process if you get no error messages.
   # You can use 'jobs' to find the process number, e.g.:
   $ kill %1


   # To ensure the framebuffer is always running all the time, you
   # can add a CRON job.

   $ export DISPLAY={your ip}:0.0

   # Edit the cron list for the 'annotate' user.
   $ crontab -e

   # Add a line like (with the correct path to startxvfb.sh)
   * * * * * bash /var/www/html/annotate/scripts/startxvfb.sh >/dev/null 2>&1

   # You can check the XVfb process is running after a minute using 'top'
   # It will create a X server on localhost:1.0

To test it out, you can take a snapshot of a web page, and look at its index page - it should show 'Generating thumbnail...' and then a small image. The image is stored somewhere like: /var/www/html/annotate/docs/{date}/{code}/small.png

If the thumbnail is not generated correctly, you can also try running the thumbnail generator from the command line :

   # as root...
   % su apache        # or your apache user, e.g. www-data on Ubuntu
   $ cd /var/www/html/annotate/scripts/
   $ ./fpreview.sh http://www.textensor.com
   # ... should generate a thumbnail '/tmp/small.png'
   # if it doesn't work, check the paths in the scripts/bashconfig.inc settings
   # in particular APACHE_HOME, APACHE_USER, FIREFOXEXENAME and FIREFOXEXE

5. Enabling Export of PDF with notes attached

There is Java code included to generate a PDF with the notes attached (from the Tools > Export PDF menu option). To enable this, you need to have installed Java on your server:

  # as root...
(on ubuntu)
  % sudo apt-get install sun-java6

(other linux distributions will have different package names)

If 'java' isn't installed to the standard path, you can set the version of java to use with the $javaexe setting in php/phpconfig.inc:

 // e.g. ... in phpconfig.inc:
 $javaexe = "/opt/jre1.6.0/bin/java";

6. Set the initial note tags available to new users

Each user account maintains a list of tags which have been used by that user, and these are used to populate the tags chooser for new notes. You can initialise this list for new user accounts by editing the text file 'php/inittags.txt' - the format is plain text, one line per tag.

  cd php
  vi inittags.txt

7. Enabling email notifications

To enable email notifications on the server (so users get sent an email when someone adds a comment to a document), you need to set up a regular CRON job to check for news. There is a PHP script php/sendEmailNotifications.php in your installation which you can run by viewing it in your browser - to set up a cron job to fetch this URL every 10 minutes:

  # as root...
  % su annotate
  $ crontab -e
* * * * */10 /usr/bin/curl "http://www.yoursite.com/annotate/php/sendEmailNotifications.php" -o - >/dev/null 2>&1

Note that your users will have to choose to switch on email notifications for their account - there is a link on the home page, and the account page lets you control detailed settings (e.g. for immediate, hourly or daily updates).

8. Advanced configuration settings

A number of installation settings are present in the phpconfig.inc file which can be used to change the standard behaviour of A.nnotate, and use your own logo / branding / messages. The basic settings are below, see your phpconfig.inc file for details of all the options.

// Optional: Change the default note edit/delete/content settings.
// $authorOnlyDelete = 1;   // Uncomment so doc owner can't delete other's comments.
// $authorOnlyEdit = 1;     // Uncomment so doc owner can't edit others' comments.
// $fixOnReply = 1;         // Uncomment to stop notes with replies being deleted.
// $anyEdit = 1;               // [added v3.0.21] Uncomment to allow any viewer to edit other's comments.
// $allowJavascriptNotes = 1;  // [added May09] Uncomment to allow javascript: urls in notes

// Optional: Customize the welcome message in the banner of home.php
// $todaysMessage = "Welcome to A.nnotate and hello world";

// Optional: Override the A.nnotate logo displayed in the 
// top left with your own logo. You can include html;
// use an absolute URL for images, e.g.:
// $customBannerLogo = "<img border='0' src='http://www.textensor.com/textensor-200.png' />";

// Optional: Don't send users emails on creating accounts.
// $noNewAccountEmail = 1;

// Optional: Don't give users a welcome document.
// $noSampleDocument = 1;  

// Optional: Customize the footer used when exporting PDFs with notes.
// The default footer just has the page number of the orig document.
//
// For a footer like this one uncomment the settings below:
//   "Page 1. {document title} - generated by user123 - notes by [joe,jill] - visit http://yoursite.com"
//
// $pdffooter_title       = 1; // add document title too.
// $pdffooter_generatedby = 1; // add who it was generated by.
// $pdffooter_annotators  = 1; // add annotators too.
// $pdffooter = " - visit http://yoursite.com"; 

9. Configuring your PHP installation to support large document uploads

A.nnotate doesn't impose any file size limit for uploads itself - but there will be limits set in your "php.ini" apache/php configuration file. You can find what they are set to on your system, and where your php.ini config file is by pointing your browser at a file 'phpinfo.php' file which includes the line:

 <?php phpinfo(); ?>

Relevant php.ini settings are: file_uploads, upload_max_filesize, max_input_time, memory_limit, max_execution_time, post_max_size. You may want to increase the default settings, e.g. to:

  # Sample settings for php.ini:
  post_max_size=32M
  upload_max_filesize=32M
  max_execution_time=180
  max_input_time=120

You will need to restart your web server for any changes to take effect - you can view a phpinfo.php file to check their values.

10. Backing up your documents and notes

All documents are stored in the docs/ folder; all notes are stored in the private/ folder. You should take regular backups of these folders, e.g. by running a cron job which uses the rsync tool to make an incremental remote copy on another server.

11. Apache cache and cookie settings

A.nnotate has been designed to make use of client web browser caches to minimize the number of server requests for pages and notes. For Apache, a sample htaccess-cache.txt file is supplied which should be copied to annotate/.htaccess. This uses the mod_expires apache module to add a HTTP header to allow browsers to cache static content (such as page images). You need to make sure that the optional mod_expires module is enabled, so uncomment the lines below in your httpd.conf apache config file and restart the web server:

LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so

On Ubuntu, the configuration of optional apache modules can be done by linking from /etc/apache2/mods-enabled/ to /etc/apache2/mods-available:

$ cd /etc/apache2/mods-enabled
$ ln -s ../rewrite.load rewrite.load
$ ln -s ../headers.load headers.load
$ ln -s ../expires.load expires.load

The default .htaccess file is below: You can force a page reload at any time from a browser using shift-reload (on Firefox) or ctrl-reload (on IE) - or clear your browser cache then reload.

ExpiresActive On
ExpiresDefault "access plus 1 week"

11.1 Enabling compression

Configuring your apache server to serve up compressed versions of html and javascript speeds up the a.nnotate server significantly as transfers of notes and code to the browser will be faster. You need to enable the mod_deflate apache module and edit your .htaccess file or httpd.conf settings as below: (sample provided in htaccess-cache-gzip.txt)

ExpiresActive On
ExpiresDefault "access plus 1 week"

<Files *.js>
SetOutputFilter DEFLATE
</Files>

<Files *.css>
SetOutputFilter DEFLATE
</Files>

<Files *.html>
SetOutputFilter DEFLATE
</Files>

<Files *.txt>
SetOutputFilter DEFLATE
</Files>

11.2 Cookies and embedding an iframe with IE

If you are embedding an A.nnotate panel in another application which is hosted on a different site from your A.nnotate server, you may encounter login problems with Internet Explorer, which by default blocks 3rd party cookies (such as the PHP session cookie) needed for logins to A.nnotate to work.

Solutions to this are:

  • (1) run a.nnotate on the same server as your external web application
  • (2) run a proxy server to make it look to the browser like a.nnotate is running on the same server (e.g. using mod_proxy and mod_rewrite)
  • (3) get your users to enable cookies from your annotate server on IE (they can do this by double-clicking on the 'no entry' icon in the browser footer);
  • (4) add a P3P http header to every message your web server returns. For option (4), adding the P3P header, you need to enable the mod_headers optional module in httpd.conf (and restart your web server), and add the line below to your .htaccess file (a sample is included in htaccess-cookies.txt). The Microsoft IIS support site has instructions if you use the IIS web server.
Header append P3P: 'CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"'

12. Creating a robots.txt file for search engines

Search engines will not index any of the documents uploaded to your a.nnotate server unless you post a public link to the document on a website. You can also create a robots.txt file to prevent search engines indexing your content even if a link is published to the web. A sample is provided in robots-sample.txt which you can edit and copy to the root of your web directory so it can be found as http://yoursite.com/robots.txt - a sample is given below:

User-agent: *
Disallow: /annotate/

Questions / problems:

Please email any questions to support [at] nnotate.com.