Writing an index wrapper for feature selection


#1

(somehow hit a keyboard shortcut to submit in the middle of writing…)

@skystrife, I think I’ll need some help with the decorator class for the feature selection. I see how cached_index works, and I understand that selected_index (or whatever we call it) would need to wrap cached_index so only terms we use get cached.

Actually, it could only be for a selected_index wrapping a memory_forward_index. In that case, could I just write a wrapper for memory_foward_index and leave it at that? This enables feature selection for all applications using a forward_index (currently topic models and classifiers I think?). If we wanted to do feature selection for search, of course we could make another wrapper for inverted index. And now that I think of it, we might need one anyway for k-NN…

In terms of lines of code, I think it will be very short, it’s just a question of getting it correct. All code is in the language-model branch. The selected_index should contain a feature_selector object and use it to check whether to return actual postings_data objects or whether to return a blank postings_data in the case where the term is not selected.

Let me know what you think!