Model Implementation Tips ========================= Implementing algorithms is fun, but there are a few things that are good to keep in mind. In general, development follows the following: 1. Correct 2. Clear 3. Fast In that order. Further, we always want LensKit to be *usable* in an easy fashion. Code implementing commonly-used models, however, may be quite complex in order to achieve good performance. .. _iterative-training: Iterative Training ~~~~~~~~~~~~~~~~~~ The :class:`lenskit.training.IterativeTraining` class provides a standardized interface and training loop support for training models with iterative methods that pass through the training data in multiple *epochs*. Models that use this support extend :class:`~lenskit.training.IterativeTraining` in addition to :class:`~lenskit.pipeline.Component`, and implement the :meth:`~lenskit.training.IterativeTraining.training_loop` method instead of :meth:`~lenskit.training.Trainable.train`. Iteratively-trainable components should also have an ``epochs`` setting on their configuration class that specifies the number of training epochs to run. The :meth:`~lenskit.training.IterativeTraining.training_loop` method does 3 things: 1. Set up initial data structures, preparation, etc. needed for model training. 2. Train the model, yielding after each training epoch. It can optionally yield a set of metrics, such as training loss or update magnitudes. 3. Perform any final steps and training data cleanup. The model should be usable after each epoch, to support things like measuring performance on validation data. The training loop itself is represented as a Python iterator, so that a ``for`` loop will loop through the training epochs. While the interface definition specifies the ``Iterator`` type in order to minimize restrictions on component implementers, we recommend that it actually be a ``Generator``, which allows the caller to request early termination (through the :meth:`~collections.abc.Generator.close` method). We also recommend that the ``training_loop()`` method only return the generator after initial data preparation is complete, so that setup time is not included in the time taken for the first loop iteration. The easiest way to do implement this is by delegating to an inner loop function, written as a Python generator: .. code:: python def training_loop(self, data: Dataset, options: TrainingOptions): # do initial data setup/prep for training context = ... # pass off to inner generator return self._training_loop_impl(context) def _training_loop_impl(self, context): for i in range(self.config.epochs): # do the model training # compute the metrics try: yield {'loss': loss} except GeneratorExit: # client code has requested early termination break # any final cleanup steps