alpenglow package

Subpackages

Submodules

alpenglow.Getter module

class alpenglow.Getter.Getter[source]

Bases: object

Responsible for creating and managing cpp objects in the alpenglow.cpp package.

collect_ = {}
items = {}
class alpenglow.Getter.MetaGetter(a, b, c)[source]

Bases: type

Metaclass of alpenglow.Getter.Getter. Provides utilities for creating and managing cpp objects in the alpenglow.cpp package. For more information, see Python API.

collect()[source]
get_and_clean()[source]
initialize_all(objects)[source]
run_self_test(i)[source]
set_experiment_environment(online_experiment, objects)[source]

alpenglow.OnlineExperiment module

class alpenglow.OnlineExperiment.OnlineExperiment(seed=254938879, top_k=100)[source]

Bases: alpenglow.ParameterDefaults.ParameterDefaults

This is the base class of every online experiment in Alpenglow. It builds the general experimental setup needed to run the online training and evaluation of a model. It also handles default parameters and the ability to override them when instantiating an experiment.

Subclasses should implement the config() method; for more information, check the documentation of this method as well.

Online evaluation in Alpenglow is done by processing the data row-by-row and evaluating the model on each new record before providing the model with the new information.

_images/online.png

Evaluation is done by ranking the next item on the user’s toplist and saving the rank. If the item is not found in the top top_k items, the evaluation step returns NaN.

For a brief tutorial on using this class, see Five minute tutorial.

Parameters
  • seed (int) – The seed to initialize RNG-s. Should not be 0.

  • top_k (int) – The length of the toplists.

  • network_mode (bool) – Instructs the experiment to treat data as a directed graph, with source and target columns instead of user and item.

get_predictions()[source]

If the calculate_toplists parameter is set when calling run, this method can used to acquire the generated toplists.

Returns

DataFrame containing the columns record_id, time, user, item, rank and prediction.

  • record_id is the index of the record begin evaluated in the input DataFrame. Generally, there are top_k rows with the same record_id.

  • time is the time of the evaluation

  • user is the user the toplist is generated for

  • item is the item of the toplist at the rank place

  • prediction is the prediction given by the model for the (user, item) pair at the time of evaluation.

Return type

pandas.DataFrame

run(data, experimentType=None, columns={}, verbose=True, out_file=None, exclude_known=False, initialize_all=False, calculate_toplists=False, experiment_termination_time=0, memory_log=True, shuffle_same_time=True, recode=True)[source]
Parameters
  • data (pandas.DataFrame or str) – The input data, see Five minute tutorial. If this parameter is a string, it has to be in the format specified by experimentType.

  • experimentType (str) – The format of the input file if data is a string

  • columns (dict) – Optionally the mapping of the input DataFrame’s columns’ names to the expected ones.

  • verbose (bool) – Whether to write information about the experiment while running

  • out_file (str) – If set, the results of the experiment are also written to the file located at out_file.

  • exclude_known (bool) – If set to True, a user’s previosly seen items are excluded from the toplist evaluation. The eval columns of the input data should be set accordingly.

  • calculate_toplists (bool or list) – Whether to actually compute the toplists or just the ranks (the latter is faster). It can be specified on a record-by-record basis, by giving a list of booleans as parameter. The calculated toplists can be acquired after the experiment’s end by using get_predictions. Setting this to non-False implies shuffle_same_time=False

  • experiment_termination_time (int) – Stop the experiment at this timestamp.

  • memory_log (bool) – Whether to log the results to memory (to be used optionally with out_file)

  • shuffle_same_time (bool) – Whether to shuffle records with the same timestamp randomly.

  • recode (bool) – Whether to automatically recode the entity columns so that they are indexed from 1 to n. If False, the recoding needs to be handled before passing the DataFrame to the run method.

Returns

Results DataFrame if memory_log=True, empty DataFrame otherwise

Return type

DataFrame

alpenglow.ParameterDefaults module

class alpenglow.ParameterDefaults.ParameterDefaults(**parameters)[source]

Bases: object

Base class of OnlineExperiment and OfflineModel, providing utilities for parameter defaults and overriding.

check_unused_parameters()[source]
parameter_default(name, value)[source]
parameter_defaults(**defaults)[source]
set_parameter(name, value)[source]

alpenglow.PythonModel module

class alpenglow.PythonModel.SelfUpdatingModel[source]

Bases: object

class alpenglow.PythonModel.SubModel(parent)[source]

Bases: alpenglow.cpp.PythonModel

prediction(rec_dat)[source]
class alpenglow.PythonModel.SubRankingIteratorModel(parent)[source]

Bases: alpenglow.cpp.PythonRankingIteratorModel

iterator_get_next_(id, user)[source]
iterator_has_next_(id_, user, upper_bound)[source]
prediction(rec_dat)[source]
class alpenglow.PythonModel.SubToplistModel(parent)[source]

Bases: alpenglow.cpp.PythonToplistModel

get_top_list(u, k, exclude)[source]
prediction(rec_dat)[source]
class alpenglow.PythonModel.SubUpdater(parent)[source]

Bases: alpenglow.cpp.Updater

update(RecDat* rec_dat)[source]

Updates the associated model or other object of the simulation.

Parameters

rec_dat (RecDat*) – The newest available sample of the experiment.

Module contents