Impose a word limit on scrapped texts?
Created by: russelljjarvis
If the compression/ decompression ratio metric is deemed valuable metric, texts below a fixed constant word length compress extremely efficiently, and they act to bias this metric such that a small amount of low entropy text, has a deceptively small decompression ratio.
Small texts, seemed to unfavorably bias a lot of other text stat metrics too, so it's (arguably) in our interest to impose a word limit.