(somehow hit a keyboard shortcut to submit in the middle of writing…)
@skystrife, I think I’ll need some help with the decorator class for the feature selection. I see how
cached_index works, and I understand that
selected_index (or whatever we call it) would need to wrap
cached_index so only terms we use get cached.
Actually, it could only be for a
selected_index wrapping a
memory_forward_index. In that case, could I just write a wrapper for
memory_foward_index and leave it at that? This enables feature selection for all applications using a
forward_index (currently topic models and classifiers I think?). If we wanted to do feature selection for search, of course we could make another wrapper for inverted index. And now that I think of it, we might need one anyway for k-NN…
In terms of lines of code, I think it will be very short, it’s just a question of getting it correct. All code is in the
language-model branch. The
selected_index should contain a
feature_selector object and use it to check whether to return actual
postings_data objects or whether to return a blank
postings_data in the case where the term is not selected.
Let me know what you think!