README

GLOOKUP  	            Copyright (c) 2008-2014, Deniz Yuret

This is the code used in:

Deniz Yuret. 2008. Smoothing a Tera-word Language Model. In the 46th
Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies.  See http://goo.gl/rmD87d for details.

The glookup program reads ngram patterns with wildcards (represented
with the '_' character) from stdin and prints their counts from the
Web1T Google ngram data (whose path is given by the -p option).
Please see glookup.1 (man page), or glookup.txt (plain text format)
for documentation.

The model.pl script optimizes and tests various language models.  See
'perldoc model.pl', or model.txt for documentation.  Typical usage:

      model.pl -patterns < text > patterns
      glookup -p web1t_path < patterns > counts
      model.pl -counts counts < text

The glookup.pl script quickly searches for a given pattern in
uncompressed Google Web1T data. Use the C version for bulk processing,
the perl version to get a few counts quickly.