Topics 19-20 Flashcards
Three features of a good rating system
A good rating system will possess the following three features, which together will help entities measure the appropriateness of their internal rating systems:
- Objectivity and Homogeneity. An objective rating system will produce judgments based only on considerations tied to credit risk, while a homogeneous system implies that ratings are comparable among market segments, portfolios, and customer types.
- Specificity. A rating system is specific if it measures the distance from a default event while ignoring other financial elements that are not directly tied to potential default.
- Measurability and Verifiability. Ratings must provide correct expectations related to default probabilities which are backtested on a continuous basis.
Key measures used to assess the risk of default: probability of default (PD), cumulative probability of default, marginal probability of default, annualized default rate (ADR)

Conditional (forward) PD
Conditional (forward) PD = (PDcumt - PDcumt-1)/(Names - PDcumt-1)
Compare agencies’ ratings to internal experts-based rating systems
In terms of the criteria for a good rating system, the following comparisons can be made between agencies’ ratings and internal experts-based rating systems:
- Objectivity and Homogeneity. Agencies’ ratings are 73% compliant, while internal experts-based rating systems are 30% compliant.
- Specificity. Agencies’ ratings are close to 100% compliant, while internal experts-based rating systems are 75% compliant.
- Measurability and Verifiability. Agencies’ ratings are 75% compliant, while internal experts-based rating systems are 25% compliant.

Distinguish between structural approaches and reduced-form approaches to predicting default
The foundation of a structural approach (e.g., the Merton model) is the financial and economic theoretical assumptions that describe the overall path to default. Under this approach, building a model involves estimating the formal relationships that link the relevant variables of the model. In contrast, reduced form models (e.g., statistical and numerical approaches) arrive at a final solution using the set of variables that is most statistically suitable without factoring in the theoretical or conceptual causal relationships among variables.
A significant model risk in reduced form approaches results from a model’s dependency on the sample used to estimate it. To derive valid results, there must be a strong level of homogeneity between the sample and the population to which the model is applied.
Reduced form models used for credit risk can be classified into statistical and numerical-based categories.
- Statistical-based models use variables and relations that are selected and calibrated by statistical procedures.
- Numerical-based approaches use algorithms that connect actual defaults with observed variables.
- Both approaches can aggregate profiles, such as industry, sector, size, location, capitalization, and form of incorporation, into homogeneous “top-down” segment classifications. A “bottom-up” approach may also be used, which would classify variables based on case-by-case impacts. While numerical and statistical methods are primarily considered bottom-up approaches, experts-based approaches tend to be the most bottom up.
Describe Merton model to calculate default probability and the distance to default

Challenges/limitations of using the Merton model
There are many challenges associated with using the Merton model:
- Neither the asset value itself nor its associated volatility are observed.
- The structure of the underlying debt is typically very complex, as it involves differing maturities, covenants, guarantees, and other specifications.
- Because variables change so frequently, the model must be recalibrated continuously.
- Also, its main limitation is that it only applies to liquid, publicly traded firms.
- Using this approach for unlisted companies can be problematic due to unobservable prices and challenges with finding comparable prices.
- Finally, due to high sensitivity to market movements and underlying variables, the model tends to fall short of fully reflecting the dependence of credit risk on business and credit cycles.
Describe linear discriminant analysis (LDA), define the Z-score and its usage
- Linear discriminant analysis (LDA) is one of the most popular statistical methods used for developing scoring models. The contributions (i.e., weights) of each accounting ratio to the overall score are represented by Altman’s Z-score.
- LDA categorizes firms into two groups: the first represents performing (solvent) firms and the second represents defaulting (insolvent) firms.
- A Z cut-off point is used to differentiate both groups, although it is imperfect as both solvent and insolvent firms may have similar scores. This may lead to incorrect classifications.
- Another example of LDA is the RiskCalc® model, which was developed by Moody’s. It incorporates variables that span several areas, such as financial leverage, growth, liquidity, debt coverage, profitability, size, and assets. The model is tailored to individual countries.
- With LDA, one of the main goals is to optimize variable coefficients such that Z-scores minimize the inevitable “overlapping zone” between solvent and insolvent firms. For two groups of borrowers with similar Z-scores, the overlapping zone is a risk area where firms may end up incorrectly classified, historical versions of LDA would sometimes consider a gray area allowing for three Z-score range interpretations to determine who would be granted funding: very safe borrowers, very risky borrowers, and the middle ground of borrowers that merited further investigation. In the current world, LDA incorporates the two additional objectives of measuring default probability and assigning ratings.
- Note that LDA models typically offer only two decisions: accept or reject. Modern internal rating systems, which are based on the concept of default probability, require more options for decisions.
- For Altman: a score below 1.8 means it’s likely the company is headed for bankruptcy, while companies with scores above 3 are not likely to go bankrupt.

Calibration of LDA models
The process of fitting empirical data into a statistical model is called calibration.
This process implies that more work is still needed, even after the scoring function is estimated and Z-scores are obtained, before the model can be used.
- In the case of the model being used simply to accept or reject credit applications, calibration simply involves adjusting the Z-score cut-off to account for differences between sample and population default rates.
- In the case of the model being used to categorize borrowers into different ratings classes (thereby assigning default probabilities to borrowers), calibration will include a cut-off adjustment and a potential rescaling of Z-score default quantifications.
Because of the relative infrequency of actual defaults, a more accurate model can be derived by attempting to create more balanced samples with relatively equal (in size) groups of both performing and defaulting firms. However, the risk of equaling the sample group sizes is that the model applied to a real population will tend to overpredict defaults. To protect against this risk, the results obtained from the sample must be calibrated. If the model is only used to classify potential borrowers into performing versus defaulting firms, calibration will only involve adjusting the Z cut-off using Bayes’ theorem to equate the frequency of defaulting borrowers per the model to the frequency in the actual population.

Describe the application of logistic regression model to estimate default probability
Logistic regression models (also known as LOGIT models), which are from the
Generalized Linear Model (GLM) family, are statistical tools that are also used to predict default.
GLMs typically have three common elements:
- A systematic component, which specifies the variables used in a linear predictor function.
- A random component, which identifies both the target variable and its associated probability function.
- A link function, which is a function of the target variable mean that the model ties to the systematic component.

Define and interpret cluster analysis
Both LDA and LOGIT methodologies are considered “supervised” due to having a defined dependent variable (the default event), while independent variables are applied to determine an ex ante prediction. When the dependent variable is not explicitly defined, the statistical technique is considered “unsupervised.”
Cluster analysis looks to identify groups of similar cases in a data set. Groups represent observation subsets that exhibit homogeneity (i.e., similarities) due to variables’ profiles that allow them to be distinguished from those found in other groups.
Two approaches can be used to implement cluster analysis:
- hierarchical/aggregative clustering and
- divisive/partitioned clustering.
With hierarchical clustering, cluster hierarchies are created and aggregated on a case-by-case basis to form a tree structure with the clusters shown as leaves and the whole population shown as the roots. Clusters are merged together beginning at the leaves, and branches are followed until arriving at the roots. The end result of the analysis typically produces three forms:
- A small number of highly homogeneous, large clusters.
- Some small clusters with comprehensible and well-defined specificities.
- Single, very specific, nonaggregated units.
One of the key benefits of this method is the detection of anomalies. Many borrowers, such as merged (or demerged) companies, start-ups, and companies in liquidation, are unique. This analysis facilitates identifying these unique profiles and managing them separately from other observations.
Divisive clustering begins at the root and splits clusters based on algorithms that assign every observation to the specific cluster whose center (the average of all points in the cluster) is nearest. This approach serves to force the population into fewer cluster groups than what would be found under aggregative clustering. On the other side, high calculation power is needed as expanding the number of observations has an exponential impact.
As an example of applying cluster analysis, we can look to composite measures of profitability such as ROE and ROI. The task is to identify both specific aspects of a firms financial profile and latent (hidden) variables underlying the ratio system, such that the basic information from a firm’s financial statements can be extracted and used for modeling without redundant data and information.
Define and interpret principal component analysis
- Principal component analysis involves transforming an original tabular data set into a second, derived tabular data set.
- The performance of a given variable (equal to variance explained divided by total original variance) is referred to as communality, and the higher the communality (the more general the component is), the more relevant its ability to summarize an original set of variables into a new composed variable.
- The starting point is the extraction of the first component that achieves maximum communality. The second extraction will focus on the residuals not explained by the first component. This process will continue until we have a new principal components set, which will be orthogonal (statistically independent) by design and explain original variance in descending order. In terms of a stopping point, potential thresholds include reaching a minimum predefined variance level or a minimum communality that assures a reasonable level of information using the new set of components.
- An eigenvalue is a measure of the communality associated with an extracted component. The ideal first component is one that corresponds to the first eigenvalue of the set of variables. The second component will ideally correspond to the first eigenvalue extracted on the residuals. All original variables once standardized contribute a value of one to the final variance.
- An eigenvalue greater (less) than one implies that this component is summarizing a component of the total variance which exceeds (is less than) the information provided by the original variable. Therefore, it is common that only principal components with eigenvalues greater than one are considered.
Decribe factor analysis
Factor analysis is similar to principal component analysis, except that factor analysis is used to describe observed variables in terms of fewer unobserved variables called “factors” and can be seen as more efficient.
Factor analysis is often used as the second stage of principal component analysis. In terms of the process, step one is to standardize principal components. Then, the values of the new variables (factor loadings) should be standardized such that the mean equals zero and the standard deviation is equal to one. Even though factor loadings are not comparable (from a size and range perspective) to original variables, they are comparable to each other.
Factors will be contingent on the criteria used to conduct what is called the “rotation.” The varimax method is a rotation method used to target either small or large loadings of a particular variable associated with each factor. As a result of iteratively rotating factor pairs, the resulting solution yields results that make it feasible to identify each variable tied to a single factor. A final solution is reached once the last round provides no added benefit.
Canonical correlation method
- The canonical correlation method is a technique used to address the correspondence between a set of independent variables and a set of dependent variables.
- As an example, if an analyst wanted to understand what is explaining the default rate and any changes in default rates over various time horizons, he can look at the relationship between default rate factors and financial ratio factors and understand what common dimensions existed between the tests and the degree of shared variance.
- This analysis, which is a type of factor analysis, helps us find linear combinations of the two sets that have a maximum correlation with each other. From this analysis, we can determine how many factors are embedded in the set of dependent variables and what the corresponding factors are out of the independent variables that have maximum correlations with the factors from the dependent variable set. The factors from both sets are independent of one another.
- Although this method is very powerful, the disadvantages are that it is difficult to rigorously calculate scores for factors, and measuring the borrower profiles can only be done by proxy as opposed to measuring them in new independent and dependent factors.
Describe the use of a cash flow simulation model in assigning rating and default probability, and explain the limitations of the model
- A cash flow simulation model is most often used to assign ratings to companies that have non-existent or relatively meaningless track records. In an ideal situation, a given firm’s future cash flow simulation will stay in the middle between structural and reduced form models. The simulation will be based on forecasting a firm’s pro forma financial reports and studying the volatility of future performances.
- One of the biggest risks of cash flow simulation models is model risk, which stems from the fact that any model serves as a simplified version of reality. Defining default for the purposes of the model is also challenging, as it cannot always be known if and when a default will actually be filed in real-life circumstances.
- Therefore, the default threshold needs to be set such that it is not too early (the risk of having too many defaults, resulting in transactions that are deemed risky when they are not truly risky) and not too late (the risk of having not enough defaults, thereby understating the potential risk).
- Costs must also be taken into account, as models can cost a lot of money to build, maintain, and calibrate.
- Even given these issues, there are not many feasible alternatives to using the simulation model for a firm in certain conditions when historical data cannot be observed.
Heuristic and numerical methods in predicting defaults
Through the application of artificial intelligence methods, other techniques have been applied to predicting default in recent years. These two primary approaches include:
- Heuristic methods. These methods are designed to mirror human decision-making processes and procedures. Trial by error is used to generate new knowledge rather than using statistical modeling. These methods are also known as “expert systems,” with a goal of reproducing high frequency standardized decisions at the highest level of quality at a low cost. The fundamental idea is to learn from both successes and errors.
- Numerical methods. The objective of these methods is to derive optimal solutions using “trained” algorithms and incorporate decisions based on relatively weak information in very complex environments. An example of this is a “neural network,” which is able to continuously update itself in order to incorporate modifications to the environment.
Expert system
An expert system, which is a traditional application of artificial intelligence, is a set of software solutions designed to produce answers to problems where human experts would otherwise be needed. Expert systems will typically involve the creation of a knowledge base and will use knowledge engineering to gather and codify knowledge into a framework.
The typical components of an expert system include the working memory (short-term memory), the user interface/communication, the knowledge base (long-term memory), and the inferential engine (the heart/nervous network).
The rule base of an expert system consists of many inference rules (which are designed to resemble human behavior); these go into the knowledge base as separate rules, and the inference engine serves to bring them together to draw conclusions.
The inference engine can use either backward chaining or forward chaining.
- With backward chaining (goal driven), the starting point is a list of goals. Working backward, the expert system will look to find paths that will allow it to achieve these goals. Rules are searched until one is found which best aligns to the desired goal.
- With forward chaining (data driven), the starting point is available data. Inference rules are applied until a desired goal is achieved. Once the path is recognized as successful, it is applied to the data.
An expert system may also incorporate “fuzzy logic” applications. This logic applies “rules of thumb” based on feelings and uses approximate as opposed to precise reasoning. A fuzzy logic variable will not be confined to the extremes of zero and one; rather, they can assume any value that exists between the two extreme values.
A subset of expert systems is decision support systems (DSSs), which are applied to certain phases of the human decision-making process and involve very complex and cumbersome calculations.
Neural networks
Neural networks come from biological studies and serve to simulate human brain behavior. These networks involve the interconnection of artificial neurons (software programs designed to mirror the properties of biological neurons) and have the ability to continuously learn by experience.
- One of the key benefits of the neural network method is its ability to capture nonlinear relationships. Because a network may have thousands of nodes and even more potential connections, the flexibility exists to handle highly complex, nonlinear, recursive, and independent problems. The most common structure is the “hierarchically dependent neural network.”
- In terms of limitations, there is no way to look step-by-step at neural networks to determine how results are obtained; we have to accept that the results will come from what appears like a “black box,” which makes it impossible to explain how and why we arrived at a specific result. A way around this issue is to prepare multiple data sets characterized by distinguishing profiles and then put them in the neural network to obtain results. With outputs coming from homogeneous inputs, it is possible to then deduce the critical variables and their associated weights.
- Also, these networks are highly sensitive to the quality of the inputs; as such, data sets must be carefully chosen to not have the model learn from outliers.
- In addition, continuous quantitative variables are more appropriate for neural networks than qualitative variables.
- Over-fitting is a major risk for estimating neural networks, as a network that over-fits a sample of data will not be able to produce quality results when applied to other samples, such as sectors, borrowers, economic cycle stages, and geographic areas.
Comparison of heuristic and numerical methods
- An expert system is advantageous when human experts have known, clear, and well-dominated experience; this experience allows for the formalization of rules and building of effective systems.
- For the purposes of rating assignments, expert systems provide objectivity, order, and discipline to the ratings process; however, they do not provide new knowledge because they are not inferential methods or models.
- Numerical approaches, like neural networks, provide classifications, often with low granularity (like very good, pass, reject, etc.). These models are not statistical models and, therefore, do not produce outputs like probabilities of default. This limitation, along with the “black box” limitation, limits the usefulness of neural networks outside of segments such as personal loans or consumer credit. However, they can be used for potential early warnings and credit quality monitoring. Also, a neural network is very useful for processing extremely large quantities of data, adjusting quickly when a discontinuity occurs, and creating new rules when a change in the pattern of success/failure is uncovered.
Comparing heuristic approaches (i.e., expert systems and decision support systems) to numerical approaches (i.e., neural networks) across the three key features of a good ratings system discussed earlier shows the following results:
- Objectivity and Homogeneity. Both are almost entirely compliant.
- Specificity. The numerical approach is 73% compliant, while the heuristic approach is 30% compliant.
- Measurability and Verifiability. The numerical approach is 75% compliant, while the heuristic approach is 50% compliant.
Describe the role and management of qualitative information in assessing probability of default
From the perspective of using judgment to ultimately determine credit approval, three categories are used to encapsulate qualitative information:
- Investment, innovation, and technology.
- Human resource management, motivation, retention of key resources, and maximizing talent.
- Effective and efficient internal processes
Categorical types of information include binary information (such as yes/no), nominal information (like locations of incorporation), and ordinal classifications with graduating levels (such as low, medium, and high).
- Binary information can be represented as dummy variables (i.e., 0 or 1).
- Ordinal information can be assigned numbers and weights differing at each level. Even with these options for quantification, the lack of historical data is a major problem with using qualitative information.
A potential mechanism for overcoming these issues is to invoke a two-stage process:
- Stage 1: Build a quantitative model along with launching a systematic qualitative data collection on new reports.
- Stage 2: Once Stage 1 has produced enough information, build a new model which includes the new qualitative information.
In spite of the challenges of incorporating qualitative data, this data set is a critical element to building powerful credit models and driving long-term value creation for banks.
Recommendations to using qualitative information in the models.
- A first recommendation is to only gather qualitative information that is not collectable in quantitative terms. For instance, growth and financial structure information can be extracted from balance sheets.
- A second recommendation regards how to manage qualitative information in quantitative models.
Basic characteristics of Black-Scholes-Merton model (assumptions)
The simplest form of the model assumes the existence of a non-dividend paying firm with only one liability claim and that financial markets are perfect. That is, the model assumes away taxes, bankruptcy costs, and costs associated with enforcing contracts.
The Black-Scholes-Merton option-pricing model for European options can be modified to determine the value of equity prior to T, T — t, if additional assumptions are made, which include:
- Firm value characterized by a lognormal distribution with constant volatility, σ.
- Constant interest rate, r.
- Perfect financial market with continuous trading.
The Value of Equity at Time t

The Value of Debt at Time t
There are two methods for valuing risky debt in this framework. Risky debt is equal to:
- Risk-free debt minus a put option on the firm.
- Firm value minus equity value.
Figure 2 shows the general relationships between debt and equity values according to the inputs of the Merton model.

Distance to default (DD) and Lognormal DD
Lognormal price-based DD = [V(t) - Default] / [sigma * V(t)]





