An overview of variables and features on Boosted Insights.
Table of Content
At first glance, variables and features are both inputs that go into a model and seem interchangeable; however, there are nuances that separate the two.
Variables, the simpler of the two, are raw data inputs that are provided by our data partners (ie. Capital Expenditure, Cash Flow / Share, etc). Because raw data inputs are messy, most variables need to be restructured to become interpretable by the machine or to better fit your needs. You can use raw variables as model inputs, but those variables need to be comparable across companies (ie. Percent Growth).
Below are the three processes for restructuring variables:
- Formula - Formula is used to combine variables or apply arithmetic to variables to better fit your need. For example, you can subtract Operating Expenses from Gross Profit to create Operating Profit.
- Normalization - Normalization is the process of restructuring variables to be comparable across companies. For example, a variable that looks at the change in stock price should be normalized because a $10 change is a huge movement for a stock that trades at $20 per share but less so for a stock that trades at $2000 per share. To normalize the change in stock price, we would turn it into a % change.
- Transformation - Transformation changes the format of your data for a different view. For example, you can transform Capital Expenditure to 1 Month Change in Capital Expenditure.
Features are variables that have exactly 0 or 1 formula, normalization, and/or transformation applied to them. Multiple formulas, normalization, and transformations can be applied to a variable, but they would be different features. For example, if we wanted to apply 1 formula, 3 normalizations, and 2 transformations on a variable, we would get 6 features in the end (1 formula * 3 normalizations * 2 transformations).
Although features can be similar due to being derived from the same variable, the signals they generate may be different. However, it is important to know that too many similar features can create multicollinearity.
Variable Rollup is the grouping of similar features into familiar categories to help us better understand the driving forces behind stock ratings. Every feature is grouped into either Fundamental, Technical, or Macro. Within the 3 categories, there are sub-categories and sub-sub-categories to better organize features. There are Driver Ratings at each level, which help us understand how a feature or category contributes to a stock rating. To learn more, see our article on Drivers and Drive Ratings.