creating multiple datasets for use in model fitting and testing
transforming variables to allow models to pick up signal more effectively (ie. PCA, reducing the number of categories for variables with a large number of values, creating additional variables/flags to highlight certain relationships in the data)