John Lucy, a conscientious and conservative researcher of Whorfian hypotheses, has remarked:
While some Essentialists have acknowledged these problems with the reliability of informal methods, others have, in effect, denied theirrelevance. For example, Colin Phillips (2010) argues that “there is little evidence for the frequent claim that sloppy data-collection practices have harmed the development of linguistic theories”. He admits that not all is epistemologically well in syntactic theory, but adds, “I just don't think that the problems will be solved by a few rating surveys.” He concludes:

A number of considerations are relevant to formulating, testing, and evaluating Whorfian hypotheses.
Gold assumed that the hypotheses, in the case of language learning, were generative grammars (or alternatively parsers; he proves resultsconcerning both, but for brevity we follow most of the literature andneglect the very similar results on parsers). The learner's task is conceived of as responding to an unending input data stream (ultimately complete, in that every expression eventually turns up) by enunciating a sequence of guesses at grammars.

is highly significant in plant speciation, and one out of ten species of birds are known to hybridize. There are also examples of hybridization in mammals and insects, however it often produces sterile offspring. Hybridization often produces variations that prove beneficial in extreme environments.

The entire web itself can be used as a corpus to some degree, despiteits constantly changing content, its multilinguality, its many tablesand images, and its total lack of quality control; but when it is, the outputs of searches are nearly always cleaned by disregarding unwanted results. For example, Google searches are blind to punctuation, capitalization, and sentence boundaries, so search results for will unfortunately include irrelevant cases, such as where a sentence like happens to be followed by a sentence like .

The worry is that use of experimental methods is so resourceconsumptive that it would impede the formulation of linguistictheories. But this changes the subject from the importance of usingreliable data as evidence in theory testing to using onlyexperimentally gathered data in theory formulation. We arenot aware of anyone who has ever suggested that at the stage ofhypothesis development or theory formulation the linguist shouldeschew intuition. Certainly Bard et al., Schütze, Cowart, Gibson& Fedorenko, and Ferreira say no such thing. The relevant issueconcerns what data should be used to test theories, which isa very different matter.

Her discussion supports the view that various highly abstract theoretical hypotheses have been defended through the use of generalizations based on unreliable data.

One of the purposes of a treebank is to permit the further investigation of a language and the checking of further linguistic hypotheses by searching a large database of previously established analyses. It can also be used to test grammars, natural language processing systems, or machine learning programs.