10 - AutoRecSys Flashcards

1
Q

What is AutoRecSys?

A

Automation of Recommender Systems development pipeline from data pre-processing to model selection and post-processing of predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of AutoRecSys?

A

The goal of AutoRecSys is to make the development of a recommender system more efficient and accessible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the motivation of AutoRecSys?

A
  • Automation of tedious components
  • Focus on complex development tasks rather than time-consuming tasks
  • Making development of recommender systems more accessible to the general public
  • Many decisions in development are arbitrary
  • Promote academic integrity and research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AutoML?

A

Automated Machine Learning provides methods and processes to make Machine Learning accessible to non-experts, to increase efficiency and to increase research in Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the CASH problem?

A

Solving the combined algorithm selection and hyperparameter optimization problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Algorithm Selection Problem?

A

From a set of existing algorithms, choose the algorithm that performs best for the current problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is algorithm selection also called a meta-learning approach?

A

Algorithm selection is performed with ML methods on ML algorithms - therefore it is called a meta-learning approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What hyperparameter optimization methods are available?

A
  • Grid Search
  • Random Search
  • Bayesian Hyperparameter Optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the hyperparameter optimization Grid Search work?

A
  • Tests all combinations of given values for different parameters
  • Exhaustive Search -> Simple but inefficient
  • Array of parameters may not contain good values -> search never gets a good result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the hyperparameter optimization Random Search?

A
  • Tests for parameter values that are randomly generated in a given interval
  • Very high probability of finding a result close to the optimal result with few iterations if the parameter intervals cover sufficient parts of the optimal space
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the hyperparameter optimization Bayesian Hyperparameter Optimization work?

A
  • Structured approach to optimization
  • Principle of exploration versus exploitation
  • Very efficient, but mostly not parallelizable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Cross-Validation?

A
  • Cross-Validation is standard for machine learning assessments
  • Data is grouped
  • In each group, a model is trained with certain hyperparameters and tested on the respective datasets
  • Average of the test errors is given as the final result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What problems does Cross-Validation have?

A
  • The best algorithm is the one that achieves the best performance on a test set
  • In cross-validation, no uniform test set, therefore evaluation of the groups on a separate test set that is not changed and is the same for everyone
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the advantages of Bayesian Hyperparameter Optimization?

A
  • Extremely powerful
  • Works for any learning task
  • Automated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the disadvantages of Bayesian Hyperparameter Optimization?

A

Takes ultra-long to evaluate on many models, especially if you can not parallelize the processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Was ist Ensembling?

A
  • Tool to extend or replace hyperparameter optimization
  • Ensemble performance is equal to the performance of hyperparameter optimization but much faster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the idea behind ensembling methods?

A

Ensembling methods are based on the idea that the weighted average prediction of many different models beats the performance of a single (optimised) model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the ensembling methods?

A
  • Bagging
  • Boosting
  • Stacking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Neural Architecture Search?

A
  • Define some basic building blocks and define a strategy to search for different compositions of these building blocks
  • One of the most computationally intensive AutoML techniques -> benchmarks that include predefined neural networks to combat this problem and allow comparison of approaches
20
Q

Which AutoRecSys Libraries did we get to know?

A
  • RecZilla
  • CaMeLS
  • Auto-Surprise
  • Auto-CaseRec
  • Lenskit-Auto
  • Elliot
21
Q

What is RecZilla?

A
  • Algorithm Selection Library specifically for recommender systems
  • Creates an offline meta-model by learning the performance of meta-data sets
  • Meta-data sets consist of meta-data of the original data sets
  • Predicts the best algorithm by passing unseen meta-data through the meta-model
  • Supports many metrics and even custom metrics
  • Simple one-liner command
22
Q

What is CaMeLS?

A
  • Algorithm Selection Library specifically for Recommender Systems with a twist
  • Cooperative meta-learning service for recommender systems developed by ISG
  • Structure of meta-models similar to RecZilla
  • Client-server application
  • Meta-data and meta-model stored on server
  • Client uploads meta-data and server return leaderboard of best algorithms
  • Clients can contribute new meta-data with evaluation scores and improve the meta-model
23
Q

What are the advantages of CaMeLS?

A
  • Clients need almost no computing power to use the service
  • New data can be made available on the server by anyone
  • Meta data is anonymous by nature
24
Q

What is Auto-Surprise?

A
  • Automation of explicit feedback recommender systems library surprise
  • Developed by ISG
25
Q

Which criteria of an AutoML tool does Auto-Surprise fulfill?

A
  • Automated Algorithm Selection
  • Efficient Hyperparameter Optimization Procedure
  • User-friendliness
26
Q

What are the advantages of Auto-Surprise?

A
  • Works well
  • There is a publication
27
Q

What are the disadvantages of Auto-Surprise?

A
  • Surprise is no longer maintained
  • Explicit feedback is hardly observed in practice any more
28
Q

What is Auto-CaseRec?

A
  • Automation of recommender systems library Caserec
  • Developed by ISG
  • Library works and there is a publication but Caserec is no longer maintained and explicit feedback is no longer intersting
29
Q

What is Lenskit-Auto?

A
  • ISG’s latest AutoRecSys tool - still in development
  • Automation of the most popular Recommender Systems library Lenskit (in research and practical experiments)
30
Q

What are the advantages of Lenskit-Auto?

A

Still maintained and supports implicit feedback

31
Q

What is Elliot?

A
  • Elliot is not an AutoRecSys library, but a framework for reproducibility
  • Helps raise awareness of and solve reproducibility problems, enabling progress in AutoRecSys
  • Performs a complete experiment, from loading the dataset to collecting the results
32
Q

What is the core idea of Elliot?

A

Feed the system with a simple and straightforward configuration file that controls the framework by selecting experimental settings

33
Q

What are the advantages of Elliot?

A

Unravels the complexity of combining splitting strategies, hyperparameter model optimization, model training and reporting of experimental results

34
Q

What is Reproducibility?

A

Reproducibility is a central principle of AutoML and a prerequisite for scientific work in the field of artificial intelligence

35
Q

What general problems do we encounter at AutoRecSys?

A
  • Explicit-Implicit Evaluation difference
  • Offline-Online Gap
  • Clear Goals
  • Different Use Cases
36
Q

What is the Explicit-Implicit Evaluation Difference?

A
  • Evaluation of explicit and implicit feedback is so different that these two categories need to be solved in completely different ways with AutoRecSys
  • Shifting the interest to implicit feedback
37
Q

What is the Offline-Online Gap?

A
  • Differences in evaluations online and offline
  • Often impossible to formulate the difference
  • You can only optimize offline - how can you make sure you are optimizing for the right target?
38
Q

What are Clear Goals?

A
  • Recommender Systems often do not have a clear defined goal
  • Applications of AutoRecSys techniques need clearly defined goals and the ability to optimize against them
  • Offline-Online Gap strengthens this problem
39
Q

What are Different Use Cases?

A
  • Different possible applications for recommender systems and each needs (drastically) different approaches, also in optimization
  • Difficult to create good automation pipelines and generalise new approaches
40
Q

What is the problem with insufficient data quality and quantity?

A
  • Researchers limited by publicly available data
  • Comparison of new approaches with 20 year old datasets still state of the art
  • Data sets rarely public for privacy reasons and expensive to process
  • Experiments are mimicked with public datasets -> results often not the same
41
Q

Considering the problem of low quality and quantity of data sets, can algorithm selection work at all?

A
  • Algorithm Selection works with the meta-data of datasets
  • Results show that even under the prevailing constraints, automated algorithm selection leads to remarkable results
42
Q

What is the problem with the Hyperparameter Optimization Performance Wall?

A

Search time is several hours compared to a few seconds or minutes for baselines and the improvement in performance is less than one percent in some cases - is it really worth it?

43
Q

Why are there no recommender systems specific ensembles so far?

A
  • AutoRecSys Tools do not yet use ensembling
  • Results from Lenskit-Auto promising but worth the effort?
  • Ensembling is generally expensive and the results are only slightly better at the moment
44
Q

Why is there no meaningful research in the field of neural architecture search for recommender systems so far?

A

Recommender systems architectures are specific to their problem domains, there is no one-size-fits-all solution

45
Q

What is the problem with Automated Data Processing?

A

Automated pre- and post-processing is not yet available for recommender systems data sets