Serialization¶

Serialization is parially implemented in the Alpenglow framework. See the code samples below to discover the current serialization possibilities.

Interfaces for serialization¶

Many C++ classes have write(ostream& file) and read(istream& file) functions for serialization. However, these functions are not available directly through the python interface, and also might left unimplemented by some classes (throwing exceptions).

In case of alpenglow.cpp.Model, one can use write(std::string file_name) and read(std::string file_name).

Serialization of periodically retrained models in the online framework¶

Use the parameters write_model=True and base_out_file_name to write trained models in alpenglow.experiments.BatchFactorExperiment to disk. See the example below. Note that the model output directory (/path/to/your/output/dir/models/ in the example) must exist or no models will be written out. The model files will be numbered (e.g. model_1, model_2 etc. in the example).

from alpenglow.experiments import BatchFactorExperiment

data = "/path/to/your/data"
out_dir = "/path/to/your/output/dir/"

factor_model_experiment = BatchFactorExperiment(
    out_file=out_dir+"/output_legacy_format",
    top_k=100,
    seed=254938879,
    dimension=10,
    write_model=True,
    base_out_file_name=out_dir+"/models/model",
    learning_rate=0.03,
    number_of_iterations=10,
    period_length=100000,
    period_mode="samplenum",
    negative_rate=30
)

rankings = factor_model_experiment.run(
    data, exclude_known=True, experimentType="online_id")

You can read back your models using the same class, changing the parameters. Note that the model size parameters (dimension, period_length, period_mode) must agree. However, the training parameters (learning_rate, negative_rate, number_of_iterations) may be omitted if learn is set to False.

from alpenglow.experiments import BatchFactorExperiment

data = "/path/to/your/data"
out_dir = "/path/to/your/output/dir/"

factor_model_experiment = BatchFactorExperiment(
    out_file=out_dir+"/output_legacy_format",
    top_k=100,
    seed=254938879,
    dimension=10,
    learn=False,
    read_model=True,
    base_in_file_name=out_dir+"/models/model",
    period_length=100000,
    period_mode="samplenum"
)

rankings = factor_model_experiment.run(
    data, exclude_known=True, experimentType="online_id")

Alternatively, one could read back the models using alpenglow.experiments.BatchAndOnlineFactorExperiment and apply online updates on top of the pretrained batch models.

Serialization in offline experiments¶

See the example below:

import pandas as pd
from alpenglow.offline.models import FactorModel
import alpenglow.Getter as rs

data = pd.read_csv(
    "/path/to/your/data",
    sep=' ',
    header=None,
    names=['time', 'user', 'item', 'id', 'score', 'eval']
)
model = FactorModel(
    factor_seed=254938879,
    dimension=10,
    negative_rate=9,
    number_of_iterations=20,
)
model.fit(data)

model.model.write("output_file") #writes model to output_file
rd = rs.RecDat()
rd.user = 3
rd.item = 5
print("prediction for user=3, item=5:", model.model.prediction(rd))

#model2 must have the same dimension
model2 = FactorModel(
    factor_seed=1234,
    dimension=10,
    negative_rate=0,
    number_of_iterations=0,
)
#to create the inner model but avoid training, we need to run fit()
#on an empty dataset
data2=pd.DataFrame(columns=['time', 'user', 'item'])
model2.fit(data2)
model2.model.read("output_file") #reads back the same model
print("prediction for user=3, item=5 using the read-back model:",
    model2.model.prediction(rd))