Where is the excess size being identified? Is it the read? or in the lm(). If it is in the reading of the data, then why are you reading the dummy variables? Would it make sense to read a single column of a factor instead of 80 columns of dummy variables?