Instructions for installation on Mac:
Installation on a mac (Snow leapord 10.6.4) with english language ocr in mind:
1) Download tesseract-3.00.tar.gz and eng.traineddata.gz from the downloads page (http://code.google.com/p/tesseract-ocr/downloads/list)
2) Open a terminal, cd to wherever you downloaded the above files, then do:
tar -xf tesseract-3.00.tar.gz.gz
cd tesseract-3.00
3) Will want to install libraries libtiff (read compressed tiff files) and leptonica. Install Macports if not already installed and execute:
sudo port install tiff
sudo port install leptonica
4) Then run:
./configure
make
sudo make install
or alternatively there is a macport for tesseract:
sudo port install tesseract
5) Then move the english language pack for use with tesseract
cd ..
tar -xf eng.traineddata.gz
sudo mv eng.traineddata /usr/local/share/tessdata
- you now have a working install of tesseract set up to do ocr on english language documents. Run in the directory containing the desired .tif :
tesseract inputimage.tif outputtext -l eng
and you should get a file called outputtext.txt. in the same directory with the results!