Search This Blog

Showing posts with label libtiff. Show all posts
Showing posts with label libtiff. Show all posts

Wednesday, November 17, 2010

Installing Tesseract on Mac OSX

Instructions for installation on Mac:

Installation on a mac (Snow leapord 10.6.4) with english language ocr in mind:

1) Download tesseract-3.00.tar.gz and eng.traineddata.gz from the downloads page (http://code.google.com/p/tesseract-ocr/downloads/list)

2) Open a terminal, cd to wherever you downloaded the above files, then do:

tar -xf tesseract-3.00.tar.gz.gz

cd tesseract-3.00

3) Will want to install libraries libtiff (read compressed tiff files) and leptonica. Install Macports if not already installed and execute:

sudo port install tiff

sudo port install leptonica

4) Then run:

./configure

make

sudo make install

or alternatively there is a macport for tesseract:

sudo port install tesseract

5) Then move the english language pack for use with tesseract

cd ..

tar -xf eng.traineddata.gz

sudo mv eng.traineddata /usr/local/share/tessdata

- you now have a working install of tesseract set up to do ocr on english language documents. Run in the directory containing the desired .tif :

tesseract inputimage.tif outputtext -l eng

and you should get a file called outputtext.txt. in the same directory with the results!