A.nnotate server installation guide
Part 2: optional modules
This chapter describes how to install the optional modules for A.nnotate, and should be read after completing the basic install (with PDF and HTML support). Note that installing these modules is more complex than installing the basic A.nnotate server. You can install all, none or just a subset of these modules, depending on your requirements.
- 1. Enabling the Apache user to run programs
- 2. OpenOffice suppport - so you can upload Word / Powerpoint / Excel documents
- 3. File upload progress meter - this requires a Perl utility to be installed in cgi-bin
- 4. Generating thumbnails of snapshotted websites
- 5. Enabling the 'export PDF with notes' feature which requires Java
- 6. Set the initial tags available to new users
- 7. Enabling email notifications
- 8. Advanced phpconfig.inc configuration settings
- 9. Configuring your PHP installation for large document uploads
- 10. Backing up your documents and notes
- 11. Optimizing your Apache cache settings
- 12. Creating a robots.txt file for search engines
- 13. Custom storage folders for documents and notes
- 14. HTTPS install notes
1. Enabling the apache user to run programs
Typically PHP scripts are run as the apache user - which is a restricted account with no home directory. On Ubuntu linux, the default apache user is set up as www-data; one way to find our which user apache runs as on your system is to type ps aux | grep httpd and see who owns the cluster of 'httpd' processes.
If you want to be able to run openoffice or firefox as the apache user, (to support uploading Word documents and generating thumbnails of web pages) you will need to create a home directory where these applications store their profile settings.
# As root: first check if the apache user # already has a home directory that it can write to: # N.B. if apache runs as a different user # e.g. 'www-data' you should replace 'apache' with 'www-data' # throughout this guide, e.g. # su www-data # % su apache $ cd $ touch tmp.tmp # If this works, and the apache home directory is writable by # the apache user, you can skip the rest of this section. # If this fails, you need to set up a home directory # for the apache user - the example below sets it # to '/var/www/ahome':- % cd /var/www % mkdir ahome % chown apache ahome % chgrp apache ahome # Enable home directory for apache: % vi /etc/passwd # edit the entry for the apache user to allow logins # and set the home dir, e.g.: apache:x:48:48:Apache:/var/www/ahome:/bin/bash # Check you can su to the apache user now: % su apache
At this point you should also change the settings in the configuration file in annotate/scripts/bashconfig.inc, which is included by the various scripts for running openoffice and firefox. A sample is provided in bashconfig-sample.inc which you need to copy to bashconfig.inc and then edit:
% su annotate $ cd /var/www/html/annotate/scripts $ cp bashconfig-sample.inc bashconfig.inc $ vi bashconfig.inc # as the annotate user, edit the settings in annotate/scripts/bashconfig.inc: APACHE_USER=apache APACHE_HOME=/var/www/ahome
2. Adding OpenOffice support
Install openoffice 3.x.x or later : www.openoffice.org. It will install itself to somewhere like /opt/openoffice.org3/program.
Many linux distributions only offer older versions of openoffice via their standard install mechanism, so it is worth downloading directly from the openoffice download page. To fetch the openoffice 3.x.x. binary directly onto the server, you can follow the steps below:-
# go to: - http://download.openoffice.org/other.html # right-click on the Download link for the version you want (e.g, English-US, Linux RPM) # and 'copy link location' - it is a HTTP redirect to a download link. # Use this as the argument to curl to fetch the soffice.tgz as root on server:- # Login as root... % cd /mnt/install/downloads # or your chosen download directory % curl -L "http://openoffice.bouncer.osuosl.org/?product=OpenOffice.org&os=linuxintelwjre&lang=en-US&version=3.1.1" -o soffice.tgz # it's about 170Mb, so might take a while to download... % tar xvfz soffice.tgz % cd OOO300_m9_native_packed-1_en-US.9358/RPMS % rpm -Uvih *.rpm # - you may need to install gnome-vfs2 (for fedora the package is 'yum install gnome-vfs2') # The default install location is: # /opt/openoffice.org3/program/soffice
2.1 Optional: installing openoffice in a non-standard location
If you want to install openoffice in another directory rather than on top of any existing installation, you can use the steps described on: wiki.services.openoffice.org/wiki/Run_OOo_versions_parallel [external]. You can also use this if you do not usually use the RPM package system (e.g. on debian / ubuntu).
# ==== Optional ==== # e.g. in your home directory: % sudo apt-get install rpm % mkdir oo % cd oo % curl -L "http://openoffice.bouncer.osuosl.org/?product=OpenOffice.org&os=linuxintelwjre&lang=en-US&version=3.0.0" -o soffice.tgz % mkdir TEMP % cd TEMP % tar xvfz ../soffice.tgz % cd OOO300_m9_native_packed-1_en-US.9358/RPMS/ % mkdir TEMP_ROOT % cd TEMP_ROOT # extract the RPMs... will make an opt/ subdir % for i in ../o*.rpm; do rpm2cpio $i | cpio -id; done % mv opt ~ # or where you want the installed version # run it using (e.g.) % ~/opt/openoffice.org3/program/soffice &
2.2 Testing your openoffice installation:
# Check you can run openoffice as a normal user: % su annotate $ vi .bashrc export PATH=/opt/openoffice.org3/program:$PATH $ source .bashrc # On your local machine, set xhost+ and check your firewall # can accept X connections on the normal TCP port (6000) $ export DISPLAY={your ip}:0.0 $ soffice & # on Ubuntu, the executable could be called 'ooffice' not 'soffice'
2.3 Configuring A.nnotate to use openoffice:
There is a test document in annotate/scripts which you can try, but first you need to check the settings in annotate/scripts/bashconfig.inc (a sample is provided in bashconfig-sample.inc):
# As the annotate user, check the paths in the config file: scripts/bashconfig.inc: % su annotate $ cd /var/www/html/annotate/scripts # ... where you installed annotate $ cp bashconfig-sample.inc bashconfig.inc # ... if bashconfig.inc not present $ vi bashconfig.inc OOPATH=/opt/openoffice.org3/program OOEXE=/opt/openoffice.org3/program/soffice OOEXENAME=soffice.bin OOPYTHON=/opt/openoffice.org3/program/python # If any of these are not correct, edit the bashconfig.inc file. # The OOPYTHON setting points to the version of python which is # bundled with the installation of openoffice from openoffice.org. # For Ubuntu, you can change this to your standard python install # (e.g. /usr/bin/python). You will need the python-uno package # installed for calling openoffice. # The following command should convert 'sample.doc' to '/tmp/sample.pdf' $ ./ooconv.sh sample.doc /tmp/sample.pdf
2.4 Running openoffice in server mode
The test conversion above started openoffice, converted a document, then killed the openoffice process. This can take a few seconds for each document. You can avoid the openoffice startup time by running it in server mode, listening to a socket for incoming documents. This also has the advantage that you can run openoffice as a separate user from the Apache one (e.g. you could create a new user just to run openoffice).
# As the user you want to run openoffice as: # (as root) % adduser openoffice % su openoffice # (as the 'openoffice' user') $ cd /var/www/html/annotate/scripts $ ./oocron.sh # this starts up openoffice # Check that the 'soffice.bin' process is running: $ ps aux | grep soffice # Try converting a test file again a couple of times, running # as the apache user: # (as root) % su apache $ ./ooconv.sh sample.doc /tmp/test5.pdf $ ./ooconv.sh sample.doc /tmp/test6.pdf # All being well, the second time should have been much # faster, as you avoid the startup time of openoffice. # You need to keep the openoffice process alive all the time # e.g. using a cron job, as your chosen openoffice user, adding an # entry like: # as root... % su openoffice $ crontab -e * * * * * bash /var/www/html/annotate/scripts/oocron.sh >/dev/null 2>&1
While openoffice is running in server mode, the conversions from office formats should be much faster.
2.5 Troubleshooting OpenOffice installs on Fedora, RedHat and CentOS
If you are installing on RedHat, Fedora or CentOS, check this blog post [external] for a solution to a known bug with the yum installation system for openoffice, which can break the openoffice install if automatic updates are switched on.
If the CRON job above is not starting the openoffice process properly, then check /var/log/cron for messages; if you see entries like 'Error: PAM Access Problems', then you may need to explicitly enable the cron daemon to run tasks as the openoffice user, with a line in /etc/security/access.conf.
2.6 Updating the php/phpconfig.inc file to enable openoffice support
To enable support for the office formats when you upload a document to your annotate server, edit your php/phpconfig.inc file as follows:
# Edit the setting in php/phpconfig.inc to point to the ooconv.sh script: % su annotate $ cd /var/www/html/annotate/php # ... or your install directory $ vi phpconfig.inc $ooshcommand="/bin/bash /var/www/html/annotate/scripts/ooconv.sh"; # Test it out by uploading a short Word / openoffice file on your # documents.php page.
2.7 Using openoffice to convert uploaded images to PDF [new Dec 2009]
You can configure openoffice to convert uploaded image files to PDF and then use the same annotation interface as text documents (by default, image files are shown using the HTML annotation interface, in a separate frame). To set this up, add the line below to your phpconfig.inc file:
// Optional: Uncomment to convert uploaded images to pdf using OO $convertUploadedImagesToPDF = 1;
2.8 Installing Windows Fonts for OpenOffice on Linux
By default, an openoffice installation on Linux will not have access to the standard Windows fonts (Arial, Verdana etc), which can cause problems with the Word to PDF conversion for documents created on a Microsoft operating system. Unlike PDF files, Word files do not include the fonts they depend on, and assume the recipient has the relevant fonts installed. However, it is possible to install the Windows standard fonts on Linux which greatly improves the quality of generated PDFs from Word files.
# Install microsoft truetype fonts on Ubuntu / debian: % sudo apt-get install msttcorefonts
This (external) blog entry has details on installing truetype fonts on Linux; another blog entry has notes on using the new MS Vista fonts on Linux. The basic steps for installing TrueType fonts and making them available to applications (including openoffice) are outlined below. On Windows systems, your fonts will be installed to a path like: C:\WINDOWS\Fonts\*.ttf. You will have to restart openoffice after installing fonts.
# Check you have the standard PostScript Type1 fonts installed: # (e.g. on Fedora:) % yum install ghostscript-fonts # Steps for installing additional TrueType fonts on Linux % cd /usr/share/fonts/truetype % mkdir myfonts % cd myfonts # ... copy the *.ttf files to myfonts/ % mkfontdir % fc-cache
3. File upload progress meter
To display a progress bar during upload, you need to install a Perl script into the cgi-bin/ directory, and copy the configuration settings from 'fheader-sample.inc' to 'fheader.inc':
# If you haven't yet set up your server to run cgi-bin scripts yet: # As root, check the Apache configuration file # (e.g. in /etc/httpd/conf or /etc/apache2/apache2.conf /mnt/install/apache/conf/httpd.conf) # If you haven't set up your apache for cgi-bin, check # that the mod_alias module is installed. % vi /etc/httpd/conf # The cgi-bin setting will be in a line like:- ScriptAlias /cgi-bin/ "/var/www/cgi-bin/" # You can put the Annotate perl scripts in a subdirectory: # as root: % mkdir /var/www/cgi-bin/annotate % chmod a+rx annotate % cd /var/www/cgi-bin/annotate % cp /var/www/html/annotate/cgi-bin/* . # The settings are in a perl file 'cgi-bin/fheader.inc' - edit this # to make sure the paths are set correctly. The important # setting here is the temporary directory to use for uploads, # as it must agree with the setting in php/phpconfig.inc # You can leave it as the default (/tmp/annotate), or change # it in both places. % cp fheader-sample.inc fheader.inc % vi fheader.inc $tmp_dir="/tmp/annotate"; % chmod a+x * # Test running the perl script from the command line. # If you get any error messages here, the script won't # run from the cgi-bin directory either. % ./fheader.inc # ... this should do nothing. # Try visiting: http://your.server/cgi-bin/annotate/printenv.pl from browser # to check the cgi-bin is working. It should print out a list of environment # variables to the browser. # Gotchas: if you get 'Internal Server Error' it could be # caused by having DOS not Unix return characters at the end of # the script lines. You can fix this with the dos2unix command. # Also worth checking the log files (somewhere like /var/log/httpd) # Some perl / cgi-bin installations have security settings which # will only run cgi-bin programs if they have the same owner/group # as the cgi-bin user, so you may need to set the owner of the cgi-bin # perl scripts if this is the case. # Edit the php/phpconfig.inc file to switch the file upload progress bar on: % su annotate $ cd /var/www/html/annotate/php $ vi phpconfig.inc $uploadtmpdir = "/tmp/annotate"; $fileuploadprogress = true; $fileuploadcgibin = "/cgi-bin/annotate/"; # Try uploading a pdf file by browsing to your documents.php page; # you should now see a blue progress bar during the upload.
4. Generating thumbnails of websites
To generate thumbnails of websites (displayed on your index pages next to the list of notes), you need to install a web browser (firefox) and a virtual X framebuffer (Xvfb). You will also need the netpbm tools (pnmtopng, pnmscale).
# As root: # (on fedora core 4):- % yum install xorg-x11 % yum install xorg-x11-Xvfb # (on fedora 8):- % yum install xorg-x11-server-Xorg % yum install xorg-x11-server-Xvfb % yum install xorg-x11-fonts-* % yum install xorg-x11-apps-* # (on ubuntu) % sudo apt-get install xvfb % yum install firefox
The thumbnail generation will be run as the apache user on some systems. However, in order to enable the apache user to run firefox, there has to be a home directory created for apache (on many systems, apache runs as a restricted user). (see section 1 above for creating a home directory for apache).
# Check you can su to the apache user: # (on Ubuntu, this is 'www-data' not 'apache'): # su www-data % su apache # Try running firefox as the apache user; store # the settings in the profile 'test' # On your local machine, xhost+ $ export DISPLAY={your IP}:0.0 $ firefox -CreateProfile test # Run firefox with the display on your X window: $ firefox -P test http://www.textensor.com # Resize the window to be about 1000 x 1024 pixels. # When firefox starts again, it will keep # the size, which will be used for the screenshots # for the thumbnails. # At this point, you also need to switch off the # 'Restore session' window which will appear on # restarting firefox if it crashes for any reason. # 1. Type 'about:config' in the location bar. # 2. go to the 'browser.sessionstore.enabled' setting # 3. change the setting to false (double-click on the entry) # Quit firefox, and the settings will be saved in the profile.
To run firefox on the server, you will need to have the Xvfb frame buffer running all the time. One way to do this is to set up a cron job to check Xvfb and start it if it is not running - this will also automatically restart the X display if the process dies for any reason.
# As root... % su annotate # We will run the X framebuffer as the 'annotate' user. # Check that you can start Xvfb manually: $ cd /var/www/html/annotate/scripts/ $ ./startxvfb.sh # If this works ok, check you can run firefox using the Xvfb display: $ export DISPLAY=:1 $ firefox & # You won't see any output, as firefox is displaying to # the Xvfb virtual frame buffer. # Kill the firefox process if you get no error messages. # You can use 'jobs' to find the process number, e.g.: $ kill %1 # To ensure the framebuffer is always running all the time, you # can add a CRON job. $ export DISPLAY={your ip}:0.0 # Edit the cron list for the 'annotate' user. $ crontab -e # Add a line like (with the correct path to startxvfb.sh) * * * * * bash /var/www/html/annotate/scripts/startxvfb.sh >/dev/null 2>&1 # You can check the XVfb process is running after a minute using 'top' # It will create a X server on localhost:1.0
To test it out, you can take a snapshot of a web page, and look at its index page - it should show 'Generating thumbnail...' and then a small image. The image is stored somewhere like: /var/www/html/annotate/docs/{date}/{code}/small.png
If the thumbnail is not generated correctly, you can also try running the thumbnail generator from the command line :
# as root... % su apache # or your apache user, e.g. www-data on Ubuntu $ cd /var/www/html/annotate/scripts/ $ ./fpreview.sh http://www.textensor.com # ... should generate a thumbnail '/tmp/small.png' # if it doesn't work, check the paths in the scripts/bashconfig.inc settings # in particular APACHE_HOME, APACHE_USER, FIREFOXEXENAME and FIREFOXEXE
5. Enabling Export of PDF with notes attached
There is Java code included to generate a PDF with the notes attached (from the Tools > Export PDF menu option). To enable this, you need to have installed Java on your server:
# as root... (on ubuntu) % sudo apt-get install sun-java6 (other linux distributions will have different package names)
If 'java' isn't installed to the standard path, you can set the version of java to use with the $javaexe setting in php/phpconfig.inc:
// e.g. ... in phpconfig.inc: $javaexe = "/opt/jre1.6.0/bin/java";
6. Set the initial note tags available to new users
Each user account maintains a list of tags which have been used by that user, and these are used to populate the tags chooser for new notes. You can initialise this list for new user accounts by editing the text file 'php/inittags.txt' - the format is plain text, one line per tag.
cd php vi inittags.txt
7. Enabling email notifications
To enable email notifications on the server (so users get sent an email when someone adds a comment to a document), you need to set up a regular CRON job to check for news. There is a PHP script php/sendEmailNotifications.php in your installation which you can run by viewing it in your browser - to set up a cron job to fetch this URL every 10 minutes:
# as root... % su annotate $ crontab -e */10 * * * * /usr/bin/curl "http://www.yoursite.com/annotate/php/sendEmailNotifications.php" -o - >/dev/null 2>&1
Note that your users will have to choose to switch on email notifications for their account - there is a link on the home page, and the account page lets you control detailed settings (e.g. for immediate, hourly or daily updates).
8. Advanced configuration settings
A number of installation settings are present in the phpconfig.inc file which can be used to change the standard behaviour of A.nnotate, and use your own logo / branding / messages. The basic settings are below, see your phpconfig.inc file for details of all the options.
// Optional: Change the default note edit/delete/content settings. // $authorOnlyDelete = 1; // Uncomment so doc owner can't delete other's comments. // $authorOnlyEdit = 1; // Uncomment so doc owner can't edit others' comments. // $fixOnReply = 1; // Uncomment to stop notes with replies being deleted. // $anyEdit = 1; // [added v3.0.21] Uncomment to allow any viewer to edit other's comments. // $allowJavascriptNotes = 1; // [added May09] Uncomment to allow javascript: urls in notes // $enforceLinkSharable = 1; // [added Dec09] Require invite or linkSharable setting to access doc via link // Optional: Customize the welcome message in the banner of home.php // $todaysMessage = "Welcome to A.nnotate and hello world"; // Optional: Override the A.nnotate logo displayed in the // top left with your own logo. You can include html; // use an absolute URL for images, e.g.: // $customBannerLogo = "<img border='0' src='http://www.textensor.com/textensor-200.png' />"; // Optional: Don't send users emails on creating accounts. // $noNewAccountEmail = 1; // Optional: Don't give users a welcome document. // $noSampleDocument = 1; // Optional: Customize the footer used when exporting PDFs with notes. // The default footer just has the page number of the orig document. // // For a footer like this one uncomment the settings below: // "Page 1. {document title} - generated by user123 - notes by [joe,jill] - visit http://yoursite.com" // // $pdffooter_title = 1; // add document title too. // $pdffooter_generatedby = 1; // add who it was generated by. // $pdffooter_annotators = 1; // add annotators too. // $pdffooter = " - visit http://yoursite.com";
9. Configuring your PHP installation to support large document uploads
A.nnotate doesn't impose any file size limit for uploads itself - but there will be limits set in your "php.ini" apache/php configuration file. You can find what they are set to on your system, and where your php.ini config file is by pointing your browser at a file 'phpinfo.php' file which includes the line:
<?php phpinfo(); ?>
Relevant php.ini settings are: file_uploads, upload_max_filesize, max_input_time, memory_limit, max_execution_time, post_max_size. You may want to increase the default settings, e.g. to:
# Sample settings for php.ini: post_max_size=32M upload_max_filesize=32M max_execution_time=180 max_input_time=120
You will need to restart your web server for any changes to take effect - you can view a phpinfo.php file to check their values.
10. Backing up your documents and notes
All documents are stored in the docs/ folder; all notes are stored in the private/ folder. You should take regular backups of these folders, e.g. by running a cron job which uses the rsync tool to make an incremental remote copy on another server.
11. Apache cache and cookie settings
A.nnotate has been designed to make use of client web browser caches to minimize the number of server requests for pages and notes. For Apache, a sample htaccess-cache.txt file is supplied which should be copied to annotate/.htaccess. This uses the mod_expires apache module to add a HTTP header to allow browsers to cache static content (such as page images). You need to make sure that the optional mod_expires module is enabled, so uncomment the lines below in your httpd.conf apache config file and restart the web server:
LoadModule expires_module modules/mod_expires.so LoadModule headers_module modules/mod_headers.so
On Ubuntu, the configuration of optional apache modules can be done by linking from /etc/apache2/mods-enabled/ to /etc/apache2/mods-available:
$ cd /etc/apache2/mods-enabled $ ln -s ../rewrite.load rewrite.load $ ln -s ../headers.load headers.load $ ln -s ../expires.load expires.load
The default .htaccess file is below: You can force a page reload at any time from a browser using shift-reload (on Firefox) or ctrl-reload (on IE) - or clear your browser cache then reload.
ExpiresActive On ExpiresDefault "access plus 1 week"
11.1 Enabling compression
Configuring your apache server to serve up compressed versions of html and javascript speeds up the a.nnotate server significantly as transfers of notes and code to the browser will be faster. You need to enable the mod_deflate apache module and edit your .htaccess file or httpd.conf settings as below: (sample provided in htaccess-cache-gzip.txt)
ExpiresActive On ExpiresDefault "access plus 1 week" <Files *.js> SetOutputFilter DEFLATE </Files> <Files *.css> SetOutputFilter DEFLATE </Files> <Files *.html> SetOutputFilter DEFLATE </Files> <Files *.txt> SetOutputFilter DEFLATE </Files>
11.2 Cookies and embedding an iframe with IE
If you are embedding an A.nnotate panel in another application which is hosted on a different site from your A.nnotate server, you may encounter login problems with Internet Explorer, which by default blocks 3rd party cookies (such as the PHP session cookie) needed for logins to A.nnotate to work.
Solutions to this are:
- (1) run a.nnotate on the same server as your external web application
- (2) run a proxy server to make it look to the browser like a.nnotate is running on the same server (e.g. using mod_proxy and mod_rewrite)
- (3) get your users to enable cookies from your annotate server on IE (they can do this by double-clicking on the 'no entry' icon in the browser footer);
- (4) add a P3P http header to every message your web server returns. For option (4), adding the P3P header, you need to enable the mod_headers optional module in httpd.conf (and restart your web server), and add the line below to your .htaccess file (a sample is included in htaccess-cookies.txt). The Microsoft IIS support site has instructions if you use the IIS web server.
Header append P3P: 'CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"'
12. Creating a robots.txt file for search engines
Search engines will not index any of the documents uploaded to your a.nnotate server unless you post a public link to the document on a website. You can also create a robots.txt file to prevent search engines indexing your content even if a link is published to the web. A sample is provided in robots-sample.txt which you can edit and copy to the root of your web directory so it can be found as http://yoursite.com/robots.txt - a sample is given below:
User-agent: * Disallow: /annotate/
13. Custom storage locations for documents and notes
By default, documents are stored in the docs/ folder, and notes in the private/ folder of your annotate installation. It is possible to configure these, so documents are stored in any path on your system using the docsdir and privatedir phpconfig settings. This can be useful if you want to store on a network drive, or just separately from the rest of the a.nnotate install. These must also be specified if running using the Quercus java servlet implementation of PHP rather than regular apache-php.
The once complexity is that if you move the docs/ folder, you also need to edit your web server configuration to ensure that static content from http://yoursite.com/annotate/docs/ is served from the new folder too.
# $docsdir = "c:/test/resin-4.0.9/webapps/ROOT/annotate/docs/"; $privatedir = "c:/test/resin-4.0.9/webapps/ROOT/annotate/private/"; # ... or on linux: # $docsdir = "/var/disk123/docs/"; # $privatedir = "/var/disk123/private/"; # NB you also need to configure your web server to serve # static content from http://yoursite.com/annotate/docs/ # from the docsdir.
14. HTTPS install notes
Since v3.1.15 (Oct 2010) it has been possible to install a.nnotate on a HTTPS server. You need to configure your web server with a certificate (e.g. for testing see this external guide: [adding a self-signed HTTPS certificate with apache and ubuntu]). After this, you just need to configure the phpconfig.inc path to include https (see below), and access the server through a https: URL. Everything ought to work as normal for PDF and word documents, however HTML snapshots may display browser warnings because they load http: content as well as https: content.
The HTTPS support has been tested on Linux with Apache; contact us if you run into any issues on other configurations.
# sample config for https: $nnotatepath="https://yourserver.com/annotate";
Questions / problems:
Please email any questions to support [at] nnotate.com.