alpenglow package¶

Subpackages¶

Submodules¶

alpenglow.Getter module¶

class alpenglow.Getter.Getter[source]¶

Bases: object

Responsible for creating and managing cpp objects in the alpenglow.cpp package.

collect_ = {}¶

items = {}¶

class alpenglow.Getter.MetaGetter(a, b, c)[source]¶

Bases: type

Metaclass of alpenglow.Getter.Getter. Provides utilities for creating and managing cpp objects in the alpenglow.cpp package. For more information, see Python API.

collect()[source]¶

get_and_clean()[source]¶

initialize_all(objects)[source]¶

run_self_test(i)[source]¶

set_experiment_environment(online_experiment, objects)[source]¶

alpenglow.OnlineExperiment module¶

class alpenglow.OnlineExperiment.OnlineExperiment(seed=254938879, top_k=100)[source]¶

Bases: alpenglow.ParameterDefaults.ParameterDefaults

This is the base class of every online experiment in Alpenglow. It builds the general experimental setup needed to run the online training and evaluation of a model. It also handles default parameters and the ability to override them when instantiating an experiment.

Subclasses should implement the config() method; for more information, check the documentation of this method as well.

Online evaluation in Alpenglow is done by processing the data row-by-row and evaluating the model on each new record before providing the model with the new information.

Evaluation is done by ranking the next item on the user’s toplist and saving the rank. If the item is not found in the top top_k items, the evaluation step returns NaN.

For a brief tutorial on using this class, see Five minute tutorial.

Parameters

seed (int) – The seed to initialize RNG-s. Should not be 0.
top_k (int) – The length of the toplists.
network_mode (bool) – Instructs the experiment to treat data as a directed graph, with source and target columns instead of user and item.

get_predictions()[source]¶

If the calculate_toplists parameter is set when calling run, this method can used to acquire the generated toplists.

Returns

DataFrame containing the columns record_id, time, user, item, rank and prediction.

record_id is the index of the record begin evaluated in the input DataFrame. Generally, there are top_k rows with the same record_id.
time is the time of the evaluation
user is the user the toplist is generated for
item is the item of the toplist at the rank place
prediction is the prediction given by the model for the (user, item) pair at the time of evaluation.

Return type

pandas.DataFrame

run(data, experimentType=None, columns={}, verbose=True, out_file=None, exclude_known=False, initialize_all=False, calculate_toplists=False, experiment_termination_time=0, memory_log=True, shuffle_same_time=True, recode=True)[source]¶

Parameters

data (pandas.DataFrame or str) – The input data, see Five minute tutorial. If this parameter is a string, it has to be in the format specified by experimentType.
experimentType (str) – The format of the input file if data is a string
columns (dict) – Optionally the mapping of the input DataFrame’s columns’ names to the expected ones.
verbose (bool) – Whether to write information about the experiment while running
out_file (str) – If set, the results of the experiment are also written to the file located at out_file.
exclude_known (bool) – If set to True, a user’s previosly seen items are excluded from the toplist evaluation. The eval columns of the input data should be set accordingly.
calculate_toplists (bool or list) – Whether to actually compute the toplists or just the ranks (the latter is faster). It can be specified on a record-by-record basis, by giving a list of booleans as parameter. The calculated toplists can be acquired after the experiment’s end by using get_predictions. Setting this to non-False implies shuffle_same_time=False
experiment_termination_time (int) – Stop the experiment at this timestamp.
memory_log (bool) – Whether to log the results to memory (to be used optionally with out_file)
shuffle_same_time (bool) – Whether to shuffle records with the same timestamp randomly.
recode (bool) – Whether to automatically recode the entity columns so that they are indexed from 1 to n. If False, the recoding needs to be handled before passing the DataFrame to the run method.

Returns

Results DataFrame if memory_log=True, empty DataFrame otherwise

Return type

DataFrame

alpenglow.ParameterDefaults module¶

class alpenglow.ParameterDefaults.ParameterDefaults(**parameters)[source]¶

Bases: object

Base class of OnlineExperiment and OfflineModel, providing utilities for parameter defaults and overriding.

check_unused_parameters()[source]¶

parameter_default(name, value)[source]¶

parameter_defaults(**defaults)[source]¶

set_parameter(name, value)[source]¶

alpenglow.PythonModel module¶

class alpenglow.PythonModel.SelfUpdatingModel[source]¶: Bases: object

class alpenglow.PythonModel.SubModel(parent)[source]¶

Bases: alpenglow.cpp.PythonModel

prediction(rec_dat)[source]¶

class alpenglow.PythonModel.SubRankingIteratorModel(parent)[source]¶

Bases: alpenglow.cpp.PythonRankingIteratorModel

iterator_get_next_(id, user)[source]¶

iterator_has_next_(id_, user, upper_bound)[source]¶

prediction(rec_dat)[source]¶

class alpenglow.PythonModel.SubToplistModel(parent)[source]¶

Bases: alpenglow.cpp.PythonToplistModel

get_top_list(u, k, exclude)[source]¶

prediction(rec_dat)[source]¶

class alpenglow.PythonModel.SubUpdater(parent)[source]¶

Bases: alpenglow.cpp.Updater

update(RecDat* rec_dat)[source]¶

Updates the associated model or other object of the simulation.

Parameters: rec_dat (RecDat*) – The newest available sample of the experiment.