Binary serialization for meta component


#1

I’m using some of meta component inside my custom wrapper. What I want to do currently, is I want to serialized trained and ready for work components on the server, and then send them to actual device, and then fastly restore object which will be ready for work. At the current moment I see Cereal as a way to go. Could you give me some hint how I could achieve this without making a fork and manually add serialization code to every class I use?


#2

This depends on what exactly you are trying to serialize. For example, most of the classifiers have a void save(std::ostream&) const function that you can invoke to save them to an output stream. This currently uses the binary format from io::packed. These can then be read from a stream by using classify::load_classifier(std::istream&). See this section of the classifier tutorial for more details.

There are similar functions for the ranker class hierarchy.


#3

Thanks for the answer. At the current moment I use classifiers, crf pos tagger and a corpus, represented as dataset_view.

Speaking about io::packed - does it provide cross-platform, architecture independent binary files? If I serialize things on MacOS x86 env, could I deserealize it on linux x64?


#4

Most classifiers should serialize fine using save() one one end and load() on the other, even across platforms.

The CRF tagger might cause issues unless your platforms are both the same endianness. I have had success going from Linux x64 -> Linux x86 with it, and Linux x64 -> Windows x64, but I haven’t tried anything using ARM for example. (The problem here is that the CRF heavily uses disk_vector, so its on-disk files must match the exact in-memory representation of the same arrays since they are just being mmap()ed in.)

I do believe that io::packed is cross platform and endian-independent. That is its intent, anyway; if it’s not there’s a but we should fix.