- There is no use disputing with the translating machine. It will prevail. P. P. Troyanskii, 1947, based on his conception of automatic translation from 1933.
- Most statistical MT derives from IBM-style models (Brown et al., 1993), which ignore syntax and allow arbitrary word-to-word translation. Hence they are able to align any sentence pair, however mismatched. However, they have a tendency to translate long sentences into word salad. Jason Eisner, 2003.
- To learn, an entity must have several choices of behavior; a means of judging the success of its choice, and a way of improving its judgment. It is difficult to design a computer of this sort and harder yet to program a present-day computer to behave this way, but it is possible in principle. To produce high-quality translations, a computer must be able to learn to manipulate language and meaning. When the relations between language and meaning are specified, no matter in how complicated a way; when the criteria of high-quality translation are outlined, with suggestions about how to improve the criteria; and when the mode of improvement for each criterion is formulated, a computer can be built to produce high-quality translations. With technique, critique, and improvement rules specified heuristically, machine translation is at hand. A child of four can do it—why not a machine? John F. Tinker, Learning and Translating by Machines, 1963.
- In principle, one could model the distribution of dependency parses in any number of sensible or perverse ways. Jason Eisner, 1996.
- Of course, linguists do not generally keep NLP in mind as they do their work, so not all of linguistics should be expected to be useful. Noah A. Smith, Linguistic Structure Prediction, 2011.
- [...] we argue that meaningful, practical reductions in word error rate are hopeless. We point out that trigrams remain the de facto standard not because we don’t know how to beat them, but because no improvements justify the cost. Joshua Goodman, 2001.
- Neither the imagination of linguists nor the nature of text data can be confined to the classical formalisms of machine learning. Noah A. Smith, Linguistic Structure Prediction, 2011.
- Increasing F-score is often not a scientific contribution but how you did it may be a scientific contribution. Mark Johnson, 2012.
- [...] interpreting the world in the light of your preconceptions; those preconceptions then reinforce how you reinterpret your evidence, and those strengthen your preconception [...] The model is feeding itself, is eating its own waste. Jason Eisner, on the Baum-Welch algorithm.
- The problem is not access to annotated data, the problem is access to data [...] We have grad students who are incredibly smart who are working on beer reviews and Twitter and emojis because that's where the data is, not because they are not interested in applying [clinical NLP] techniques" Philip Resnik, 2016.
- ...
-
Hierarchies of features are less suited to challenges such as language, inference, and high-level planning. For example, as Noam Chomsky famously pointed out, language is filled with sentences you haven't seen before. Pure classifier systems don't know what to do with such sentences. The talent of feature detectors -- in identifying which member of some category something belongs to -- doesn't translate into understanding novel sentences, in which each sentence has its own unique meaning.
Gary Marcus (2015) Chapter in "The Future of the Brain", a book of "essays by the world's leading neuroscientists" (yes, really).