Presentation

Case Study: Cantilever beam

Thyge Vinther Ludvigsen, s203591

13 May, 2026

Plan

Time Plan

Countinuous damage states

  • Make an funciton for given H1 what is the stiffness reduction factor for the damage state
  • Make an rugth function there links the stiffness reduction factor to probability of failure? for the structure
  • Use the 2 functions above to update the cost matrix for undetected damage
  • Upadate FEM model to an more realistic structure. (2D, more dof’s, elments, modes etc.)
  • Set up 2-3 cass whit realistic priors and cost matrix
    • read papers and method to set up the priors and cost matrix (Takes time)
  • Implement optimization algorithm’s to find the optimal decision cretiria for the optimal snosor placements
    • what is the given information ??
  • Ilustative Work on Fream work for testing assumptions? (cut be fun?)
    • Make an sugestion for an surgate model for error in OMA (Not computabol ???)
  • Thigtening up the othere work ?
  • Comparison with traditional methods
    • we do not have the samme ground truth to compare with ??? (Not that intresting)

Stiffness reduction

Stiffness reduction effect’s

Stiffness reduction factor: \[ E_{damaged} = (1 - d) E_{undamaged} \]

where:

  • \(E_{damaged}\) is the effective Young’s modulus of the damaged structure.
  • \(E_{undamaged}\) is the Young’s modulus of the undamaged structure.
  • \(d\) is the damage variable, which ranges from 0 (no damage) to 1 (complete failure).

Static loading

Lowering the stiffness leads to:

  • \(\uparrow \delta\) (higher displacements) and \(\uparrow \epsilon\) (higher strains) (SLS)
  • Redistribution of loads in indeterminate systems, to stiffer elements. (ULS)
  • \(\downarrow P_{cr}\) lower buckling capacity, do to lower stiffness in out of plane buckling. (ULS)

Dynamic loading

Lowering the stiffness leads to:

  • \(\downarrow \omega\) lower natural frequencies (Resonance effects)
  • Large amplitude of the vibrations for the same loading (conservation of energy in equation of motion)

Note

If the stiffness reduction is due to plastic deformations, then the ductility of the structure can be decreased for further loading. This has an effect on:

  • Earthquakes where there are large cyclic loading, where the structure rely on ductility to dissipate energy.

FEM model for evaluating the effect of stiffness reduction

Static loading:

  • Linear FEM model: to capture load redistribution in indeterminate systems. (if deformations are small)
  • Nonlinear FEM model: to capture geometric instability (buckling), and non linear effects (if the deformations are not small.)
    • mostly geometric nonlinearity.

Dynamic loading:

  • dynamic linear FEM model: to capture the effect of Modal parameters when the stiffness is reduced. (if deformations are small)
    • This is the case when we do OMA.
  • dynamic nonlinear FEM model: to capture nonlinear effects, and buckling (if deformations are not small)
    • both matrial Nonlinearity and geometric nonlinearity can be captured in this model.

Continuous Damage States

General idea

Probability of level of stiffness reduction: \[ P(\delta E \mid \theta) \]

where:

  • \(\delta E\) is the stiffness reduction factor

Probability of failure given damage state add stiffness reduction factor: \[ P(\text{failure} \mid \delta E, \theta) \]

Boundary conditions: stiffness reduction factor

No damage leads to no stiffness reduction: \[ P(\delta E > 0 \mid \theta = \text{No damage}) = 0, \quad P(\delta E = 0 \mid \theta = \text{No damage}) =1 \]

Probability mass for damage state: \[ P(\delta E > 0\mid \theta = \text{damage}) = \int_0^1 P(\delta E \mid \theta = \text{damage}) d\delta E = 1 \]

Probability of level of stiffness reduction

\[ P(\delta E \mid \theta) \]

This is an function where we have no good way to estimate it, but we can make some assumptions about the shape of the function.

Assumptions:

  1. The probability of stiffness reduction of close to 100% is very low?
  2. The probability of stiffness reduction of close to 0% is low?
    • Argument 1: There is no value of upserving small martrial degradations, even if they are present, they are only valuable if they leed to faster degradations there will have an effect on the probability of failure.
    • Argument 2: the prior are constructed on the basis of \(H_1\) being that the structure is damaged, (If damage is defined as an stiffness change there leads to a significant change of probability of failure. )

Probability of failure given stiffness reduction

\[ P(\text{failure} \mid \delta E, \theta) = P(g(X) < 0 \mid \delta E, \theta) \]

variables: \[ X = \{\delta E, \theta, F, f_y, \epsilon_{ult}, ...\} \]

  • \(\delta E\) is the stiffness reduction factor
  • \(\theta\) is the damage state vector
  • \(F\) is the applied load
  • \(f_y\) is the yield strength of the steel
  • \(\epsilon_{ult}\) is the ultimate strain capacity of forexample the concrete

Note

  • \(\delta E\) is not an stochastic variable
  • \(\theta\) is not an stochastic variable
  • all the other variables are stochastic variables.

limit state function: \[ g(X)=R(X)−S(X) \]

where:

  • \(R(X)\) is the resistance / the capacity of the structure.
  • \(S(X)\) is the load effect / the response of the structure to the applied load
  • Failure occurs when \(g(X) < 0\)

Clarification

This is a note that will appear as a colored box in your PDF.

Limit state function

My expectations of the probability of failure

Relationship between stiffness reduction and probability of failure:

  • \(\downarrow \delta E \rightarrow \downarrow P_{cr}\): Liniar relationship in Static loading whit no redistribution of loads (\(P_{cr} \propto E I\))
    • Eurlerar buckling formula: \(P_{cr} = \frac{\pi^2 E I_z}{(K L)^2}\)
      • \(K\) is the effective length factor

Probability functions

I am intrested in probability distributions in the interval [0,1].

Beta PDF: \[ f(x; \alpha, \beta) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} \]

where:

  • \(\beta\) is the beta function, defined as \(B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}\)
  • \(\Gamma\) is the gamma function.

Stiffness reduction probability distribution:

  • \(a = 4\) and \(b= 4\) (symmetric, mean = 0.5, low probability at tails)
  • \(a = 4\) and \(b= 10\) (mean < 0.5, low probability at tails)

Problem whit continuous damage states

Problem 1

Eigen value problem: \[ (K(\theta, \delta E) - \omega^2 M) \phi = 0 \]

  • \(\omega\) is the natural frequency, eigen values
  • \(\phi\) is the mode shape, eigen vectors
  • \(K\) is the stiffness matrix
    • the static loading is also included in the stiffness matrix ???
  • \(M\) is the mass matrix

For the cantilever beam dos \(10^4\) evaluations of the eigen value problem take 32s on my PC.

Problem:

  • It is some what expensive to evalueta the omega and phi for each evaluation

Solution 1:

  • Build an surrogate model of the modal prameters given the siffness reduction factor, and the damage state.
    • Problem 1: the cost of building the surrogate model, increases fast whit the number of posibole damage states \(\theta\).
    • Problem 2: For my structure is the relationship cose to linear, but i think that for more complex structures is the relationship not always close to linear.
    • Problem 3: changing of the order of modes and mode shapes, when the stiffness is reduced, can lead to problems with the surrogate model.

Solution 2:

  • Use the Build FEM model maby it is fast enough

Sulution 3:

  • Extract stiffness and mass matrices form the model and try to build an faster solver?
import math as math
n = 10**3 # number of elements
k = 3 # size of combinations 

math.factorial(n) / (math.factorial(k) * math.factorial(n-k))
166167000.0

Example for understanding

  • \(\omega_1\) for the first axis
  • \(\omega_2\) for the second axis

Stochastic modal parameters:

\[ (\bar{\boldsymbol\omega}, \bar{\boldsymbol\phi}) \sim p(\boldsymbol\omega, \boldsymbol\phi \mid \theta) \]

\[ (\bar{\boldsymbol\omega}, \bar{\boldsymbol\phi}) \sim g(\theta, e) + \epsilon(\theta, e) \]

Where:

  • \(g(\theta, e)\) is the deterministic part of the model, which depends on the structural state \(\theta\) and the sensor configuration \(e\).
  • \(\epsilon(\theta, e) = \epsilon_0(\cdot) + ... + \epsilon_n(\cdot)\)

Likelihood for the Level 1 SHM system:

\[ f(y \mid H_i)= \sum_{\theta \in H_i} \int f(y, \theta, \delta E \mid H_i) d\delta E =\sum_{\theta \in H_i} \int f(y \mid \theta, \delta E) P(\theta \mid H_i) f(\delta E|\theta) \, d\delta E \]

Posterior distribution (level 1 SHM system):

\[ P(H_i \mid y) = \frac{f(y \mid H_i) P(H_i)}{f(y)} \]

Joint distribution of \(\omega_1\) and \(\delta E\) given the damage state \(\theta\):

\[ f_{\omega_1}(\omega_1, \delta E|\theta) = f_{\omega_1}(\omega_1 \mid \theta, \delta E)f_{\delta E}(\delta E|\theta) \]

Marginal distribution of \(\omega_1\) given the damage state \(\theta\): \[ f_{\omega_1}(\omega_1 \mid \theta) = \int f_{\omega_1}(\omega_1 \mid \theta, \delta E)f_{\delta E}(\delta E|\theta) d\delta E \]

Note for plot

  • this is only an ilustration of one damage state \(\theta = 3\).

Nice Book

Structural Reliability Analysis and Prediction

Questions ?

Work plan

  • Am i missing something important in the work plan ?
  • Is it oaky that i dont do comparison with traditional methods ? (it is form the procject plan)

Other

  • good ideas for building an surrogate model for the mapping \(\Delta E\) to the modal parameters ?
  • good ideas for seting up the ralationship between the stiffness reduction factor and the probability of failure ?
  • what do you think about the stiffness reduction function?

Note:

  • can you send the bridge model so i can see it?

Idear’s

Train the covariance matrix on the modal data, so i don’t need to know the parameters for the covariance matrix in the OMA error function?

Meting 2 (14/07)

Overview of last weeks meting:

  • Understanding of what stiffness reduction effect’s (static and dynamic loading).
  • Understanding of how to compute the probability of failure.
  • Presentation of curent problems and ideas for solving them.

Note: new procject name and disription

Presentation overview

  • Generate data with an continuous stiffness reduction factor, and damage state. (Step 1) (Wensday-friday)
    • Set up
      • FEM model
      • Priors \(\theta, H\)
      • distribution of the stiffness reduction factor
    • What covariance matrix to use? given e sensor configuration, and damage state?
    • Make surrogate model for the mapping from stiffness reduction factor to modal parameters?
      • investigate the relationship between the stiffness reduction factor and the modal properties.
      • plot all this infromation
      • make the surrogate model
    • Generate Modal parameter data set
  • Calculate features (Step 2) (friday)
    • \(y = [\omega_1, ... , \omega_n, MAC_{1,1}, ... , MAC_{n,n}, TMAC, MF]\)
    • print features
  • Feature selection and dimensionality reduction (Step 3) (friday-Monday)
    • Feature selection
    • Feature selection with the inclustion of cunsequences (cost matrix)
      • Assume link between stiffness reduction factor and probability of failure.
  • Training of decision model (Step 4) (Monday-Tuesday)
    • Train and Test (80/20)
      • Gaussian Mixture Model (GMM)
      • Logistic regression
    • Evaluation of decision model
  • Make full case study with the decision models (Step 5) (Tuesday)

Questions ?

Questions for this week:

Question 1: Probability of failure?

What shot i Asume the link function to be between the \(\Delta E\), \(\theta\) and the probability of failure?

Solutions:

Level 0:

  • Asumme an some what abetrary link function.

Level 1:

  • Asume that the Probability of failure is dominated by static loading, That can be discribed by an linear elastic FEM model.

Level 2:

  • Asume that the Probability of failure is dominated by static loading, That can be discribed by an Geometric Nonlinear FEM model.

Level 3-4:

  • Combiantion of matrial and geometric nonlinear dynamics FEM model, where the probability of failure.

Questions for writing the report:

Question 1: literature review

I have not written a literature review before, and could use some inputs.

  • Do you have some tips?
  • Do you have guidelines for writing a good literature review?
  • Do you have an example of a good literature review that I can look at?

Question 2: Level of detail?

  • Asume An thnical reader, with a good understanding of probability and Machine learning, but no understanding of structural health monitoring?

Question 3: Structure of my report

Is this an nice structure for my report?

Questions next week:

Question 1: Prior probability’s of damage?

  • How can i set up good priors for \(P(H_1)\) and \(P(H_0)\)?
    • Is it okay to use uniform priors for \(P(\theta)\)?

Question 2: Cost per sensor placement?

  • How can i build a good cost per sensor function?

Question 3: Cost matrix

  • How do i set up an realistic cost matrix
    • Expected cost of repair ?
    • Expected cost of un-detected failure ?
    • Expected cost of inspection ?

FEM model

Geometry

Mode Shapes

Damaged scenarios

Damage scenarios: 
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 1, 0, 0), (0, 1, 0, 1), (0, 1, 1, 0), (0, 1, 1, 1), (1, 0, 0, 0), (1, 0, 0, 1), (1, 0, 1, 0), (1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0), (1, 1, 1, 1)]
size of theta: 16
Plot of case: (1, 1, 1, 1): 

Sensor configurations

# nodes  
e_sensor = np.arange(len(X))

# list of fixed nodes
fixed_nodes = [bounds[i][0] for i in range(len(bounds))]

# remove fixed nodes
e_sensor = e_sensor[~np.isin(e_sensor, fixed_nodes)] # ~ is the logical NOT operator, which inverts the boolean array returned by np.isin
print(f"Sensor nodes: {e_sensor}")

# make an list whit all possible combinations of sensor placements
import itertools # generate the solution space of all possible combinations.
# e_configuration = list(itertools.combinations(e_sensor.tolist(), 3))
e_configuration =  np.array(list(itertools.product([0, 1], repeat=len(e_sensor)))) # all combinations of 0 and 1 for each sensor node
e_configuration = e_configuration 
print(f"Number of sensor configurations: {len(e_configuration)}")
for i in range(len(e_configuration)):
    print(f"Configuration {i} given in nodes with sensors: {e_configuration[i]}")
Sensor nodes: [1 2 3 4]
Number of sensor configurations: 16
Configuration 0 given in nodes with sensors: [0 0 0 0]
Configuration 1 given in nodes with sensors: [0 0 0 1]
Configuration 2 given in nodes with sensors: [0 0 1 0]
Configuration 3 given in nodes with sensors: [0 0 1 1]
Configuration 4 given in nodes with sensors: [0 1 0 0]
Configuration 5 given in nodes with sensors: [0 1 0 1]
Configuration 6 given in nodes with sensors: [0 1 1 0]
Configuration 7 given in nodes with sensors: [0 1 1 1]
Configuration 8 given in nodes with sensors: [1 0 0 0]
Configuration 9 given in nodes with sensors: [1 0 0 1]
Configuration 10 given in nodes with sensors: [1 0 1 0]
Configuration 11 given in nodes with sensors: [1 0 1 1]
Configuration 12 given in nodes with sensors: [1 1 0 0]
Configuration 13 given in nodes with sensors: [1 1 0 1]
Configuration 14 given in nodes with sensors: [1 1 1 0]
Configuration 15 given in nodes with sensors: [1 1 1 1]

Generate data (Step 1)

Prior distibution

H_1 = 0.01
H_0 = 1-H_1

# uninformed prior
N_prior = 2**4
prior_H = np.array([H_0, H_1])
prior_theta = np.concatenate(([H_0], np.repeat(H_1 / (N_prior - 1), N_prior - 1)))

print(f"Prior probabilities for H: {prior_H}")
print(f"\nPrior probabilities for θ: {np.round(prior_theta, 4)}")
Prior probabilities for H: [0.99 0.01]

Prior probabilities for θ: [9.9e-01 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04
 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04]

f(\(\delta E\))

I am intrested in probability distributions in the interval [0,1].

from scipy.stats import beta as beta_dist

error = 0.01
error = 0.01
x = np.linspace(0+error, 1-error, 100)

plt.figure(figsize=(8,4))
plt.plot(x, beta_dist.pdf(x, a = 1, b = 1), label=r'$\delta E \sim \beta(a=1, b=1)$', color='blue')
plt.plot(x, beta_dist.pdf(x, a = 4, b = 4), label=r'$\delta E \sim \beta(a=4, b=4)$', color='orange')
plt.plot(x, beta_dist.pdf(x, a = 4, b = 10), label=r'$\delta E \sim \beta(a=4, b=10)$', color='green')
plt.xlabel(r'$\delta E$')
plt.ylabel(r'$f(\delta E \mid H_1)$')
plt.title('Beta Distributions')
plt.legend()
plt.tight_layout()
plt.grid()
plt.show()

Covariance matrix OMA error

  • I assume that the coefficient of variation is 0.01 for the natural frequency estimations.
  • I also assume that the standard deviations is 0.1 for mode shape estimation.

Note

this coficient of variation and standard deviation is what i expect for an vell deifined Sensor configuration.

array([0.01095177, 0.06061032, 0.16487293, 0.29009585, 0.1049361 ,
       0.1049361 , 0.1049361 , 0.1049361 , 0.090164  , 0.090164  ,
       0.090164  , 0.090164  , 0.08687253, 0.08687253, 0.08687253,
       0.08687253, 0.08483097, 0.08483097, 0.08483097, 0.08483097])
S = np.array( [1, 1, 1, 1])
CV_omega_data = 0.01 # use *1.3 for the real model 
std_omega_data = 0.1

# frequencyes and mode shapes
omega_all = load["natural_frequencies"]
phi_all = mode_shapes



cov_dig = (np.concatenate([
    omega_all[0] * CV_omega_data,
    np.repeat(np.ones(np.sum(S, dtype=int)) * std_omega_data, np.sum(S, dtype=int))
]))**2

covariance_matrix = np.diag(cov_dig)

# plot the covariance matrix

plt.figure(figsize=(10, 6))
plt.title(r'Covariance Matrix Diagonal Values for $\omega_{1,2,3,4}$ and $\phi_{1,2,3,4}$')
plt.plot(covariance_matrix.diagonal()[:omega_all.shape[1]], marker='o', label=r'Variance of $\omega_{1,2,3,4}$', color='blue')
plt.plot(range(omega_all.shape[1], omega_all.shape[1] +np.sum(S, dtype=int)), covariance_matrix.diagonal()[omega_all.shape[1]:omega_all.shape[1] + np.sum(S, dtype=int)], marker='o', label=r'Variance of $\phi_1$', color='orange')
plt.plot(range(omega_all.shape[1] + np.sum(S, dtype=int), omega_all.shape[1] + 2* np.sum(S, dtype=int)), covariance_matrix.diagonal()[omega_all.shape[1] + np.sum(S, dtype=int):omega_all.shape[1] + 2* np.sum(S, dtype=int)], marker='o', label=r'Variance of $\phi_2$', color='green')
plt.plot(range(omega_all.shape[1] + 2* np.sum(S, dtype=int), omega_all.shape[1] + 3* np.sum(S, dtype=int)), covariance_matrix.diagonal()[omega_all.shape[1] + 2* np.sum(S, dtype=int):omega_all.shape[1] + 3* np.sum(S, dtype=int)], marker='o', label=r'Variance of $\phi_3$', color='red')
plt.plot(range(omega_all.shape[1] + 3* np.sum(S, dtype=int), omega_all.shape[1] + 4* np.sum(S, dtype=int)), covariance_matrix.diagonal()[omega_all.shape[1] + 3* np.sum(S, dtype=int):omega_all.shape[1] + 4* np.sum(S, dtype=int)], marker='o', label=r'Variance of $\phi_4$', color='purple')
plt.xlabel('Index')
plt.ylabel('Variance Value')
plt.ylim(0, 1)
plt.legend()
plt.grid()
plt.show()

Delta E to modal parameters mapping

  • Generate data form the FEM model
# code_run = True
code_run = False
N= 10**4
# generate delta E samples from the beta distribution
# delta_E_samples = np.random.beta(a=1, b=1, size=N)
delta_E_samples = np.linspace(0, 0.99999, N) # use a grid of delta E values instead of random samples for better visualization
if code_run:

    from python_code.OMA_class import OMA_class
    OMA = OMA_class(print_code=False, plot_code=False)

    omega_samples = np.zeros((N, omega_all.shape[0], omega_all.shape[1]))
    phi_samples = np.zeros((N, phi_all.shape[0], phi_all.shape[1], phi_all.shape[2]))

    for i in range(N):
        OMA.extract_all_feature_cases(FEM, damage_level=delta_E_samples[i], name="damage_scenarios.npz", save_to_file=False)
        omega_samples[i, :] = OMA.true_natural_frequencies
        # (OMA.true_mode_shapes[:, :, 0::3].astype(float))[:, :, e_sensor]
        true_mode_shapes = np.asarray(OMA.true_mode_shapes, dtype=float)
        mode_shapes = true_mode_shapes[:, :, 0::3][:, :, e_sensor]
        # Normalize each mode shape by its maximum absolute value
        max_abs = np.max(np.abs(mode_shapes), axis=-1, keepdims=True)
        max_abs[max_abs == 0] = 1
        mode_shapes = mode_shapes / max_abs

        phi_samples[i, :, :, :] = mode_shapes

    # 10**4 evalutions take 33.5s on my PC with
    # save omega_samples and phi_samples to a file for later use
    np.savez("data/omega_phi_samples_deltaE.npz", omega_samples=omega_samples, phi_samples=phi_samples)

# load omega_samples and phi_samples from file
data = np.load("data/omega_phi_samples_deltaE.npz")
omega_samples = data["omega_samples"]
phi_samples = data["phi_samples"]


# OMA.true_natural_frequencies, OMA.true_mode_shapes

Data visualization omega

  • discontinuous functions for mode 4

Data visualization Mode shape 1

  • See the collapse of the natural frequencies for modeshape 4 in damage scenario [2-10, 13]

MAC mode to mode

  • MAC of mode 4 to 4 colapse in damage scenario [2-10, 13]

TMAC

Surrogate model for \(\delta E\)

The surrogate model works as follows:

  1. \((\theta)\) selects a precomputed modal parameter dataset.
  2. \((\delta E)\) is used to locate the two nearest neighboring samples in that dataset.
  3. Linear interpolation between those neighboring modal parameter vectors gives the estimated modal parameters for the requested \((\delta E)\).
    • \(y(\delta E)=y_0+\frac{\delta E-\delta E_0}{\delta E_1-\delta E_0}(y_1-y_0)\)
      • \(y\) is the colapsed modal parameter vector, there include the natural frequencies and mode shapes.

Time comparison:

  • FEM model: 32s for \(10^4\) evaluations
  • Surrogate model: 3.13 ms or 0.00313 s for \(10^4\) evaluations

Surrogate model for \(\delta E\)

do to the colapse of mode shape 4 in damage scenario [2-10, 13] have i 2 good options:

  1. only do stiffness reduction up to 80%, so i do not hit the colapsing areas
  2. don’t include mode shape 4 in the model

Note: i think that i will hit problems in many structures when i reduce the stiffness a lot. that my model is not able to compute the modal parameters??? but it is strange that i setil get perfect eigenvalues but no eigenvectors??? why is that need bigger understanding of the eigen value problem ore the way that we compute the eigen value problem in the FEM model. Ithink taht we have some matrix restictions on the stiffness and mass matrix to be able to compute them fast / numerically? but is thos restictions dependent on the difrense in stiffness between ellements??

Note

  • I make the surrogate model for the hole space and then after that can i cose to only use some of the stifennes reduction factor if i want
"""3.13 ms ± 244 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) vs 32.5 s for the true FEM mdoel"""
sur_dE = surogate_dE_model_fast(phi_samples, omega_samples, np.array(OMA.theta), delta_E_samples)
sur_dE.train()

delta_E_test = np.random.rand(10**4) * 0.99
theta_index_test = np.random.randint(0, sur_dE.y.shape[1], size=10**4)
theta_index_test = np.zeros(10**4, dtype=int) + 2 # test only for theta_index = 2 for now
# %timeit sur_dE.out_matrix(delta_E_test, theta_index_test)
test_data = sur_dE.out_matrix(delta_E_test, theta_index_test)

sur_dE.out_matrix(delta_E_test, theta_index_test)

sur_dE.out_matrix( np.array([0.139]) ,  np.array([1]))
array([[ 1.04309659,  6.66379311, 18.57191988, 33.83556437,  0.09236701,
         0.32751754,  0.64586596,  1.        ,  0.50181554,  1.        ,
         0.56128953, -0.73610705,  1.        ,  0.37558852, -0.99829447,
         0.4292274 ,  1.        , -0.94228131,  0.57818057, -0.15700558]])

Generate data

I have chosen to use the uniformed beta distribution so the uniformed distribution.

I also use the surrogate model to reduce the computational cost of generating data

"""Generate data set"""
N = 10**6
prior_theta = prior_theta
covariance_matrix = covariance_matrix 
omega_all = omega_all
phi_all = phi_all
omega_H0 = omega_all[0]
S = np.array( [1, 1, 1, 1])

code_run = True
code_run = False
if code_run:
    # improtance sampling prior for the states (theta) H_0 = 0.5 and H_1 = 0.5
    prior_H_Importance = np.array([0.5, 0.5])
    prior_Importance = np.concatenate(([prior_H_Importance[0]], np.repeat(prior_H_Importance[1] / (len(prior_theta) - 1), len(prior_theta) - 1)))

    # Imoportance sampling for structural states: theta ~ Q(theta) where Q is the importance distribution
    states = np.random.choice(range(len(prior_theta)), size=N, p=prior_Importance)
    H_vec = (states > 0).astype(int)  # 0 if state == 0, else 1

    # generate Delta E samples from the beta distribution
    delta_E_samples = np.random.beta(a=1, b=1, size=N)


    # sample OMA error
    lambda_star = np.random.multivariate_normal(mean=np.zeros(covariance_matrix.shape[0]), cov=covariance_matrix, size=N)


    # find the the mean values 
    data_sur = sur_dE.out_matrix(delta_E_samples, np.array(states)) # theta_index = 14 corresponds to Theta=[1, 1, 1, 0]
    mu_omega_sur = data_sur[:, :omega_samples.shape[2]]
    # mu_phi_sur = data_sur[0, 4:].reshape(mu_phi.shape)
    mu_phi_sur = data_sur[:, 4:]

    lambda_star[:, :len(omega_H0)] += mu_omega_sur
    lambda_star[:, len(omega_H0):] += mu_phi_sur
    # lambda_star = lambda_star

    # extract omega and phi from lambda_star
    omega_bar = lambda_star[:, :omega_H0.shape[0]]
    # phi_bar = lambda_star[:, omega_H0.shape[0]:].reshape(mu_phi.shape)
    phi_bar = lambda_star[:, omega_H0.shape[0]:].reshape(N, phi_all.shape[1], phi_all.shape[1])

    # Normalize each mode shape by its maximum absolute value
    max_abs = np.max(np.abs(phi_bar), axis=-1, keepdims=True)
    max_abs[max_abs == 0] = 1
    phi_bar = phi_bar / max_abs

    # save data to file
    np.savez("data/training_data.npz", omega_bar=omega_bar, phi_bar=phi_bar, H_vec=H_vec, delta_E_samples=delta_E_samples, states=states, prior_theta=prior_theta, prior_Importance=prior_Importance)

loaded_data = np.load("data/training_data.npz")
omega_bar = loaded_data["omega_bar"]
phi_bar = loaded_data["phi_bar"]
H_vec = loaded_data["H_vec"]
delta_E_samples = loaded_data["delta_E_samples"]
states_bar = loaded_data["states"]
states = states_bar.copy()
prior_theta = loaded_data["prior_theta"]
prior_Importance = loaded_data["prior_Importance"]

Calculate features (Step 2)

Calculate features

\[ y = [\omega_1, ... , \omega_n, MAC_{1,1}, ... , MAC_{n,n}, TMAC, MF] \]

# code_run = True
code_run = False
if code_run:
    MAC_11 = np.array( [BRinf.MAC(phi_samples[0,0,0], val) for val in phi_bar[:, 0]] )
    MAC_22 = np.array( [BRinf.MAC(phi_samples[0,0,1], val) for val in phi_bar[:, 1]] )
    MAC_33 = np.array( [BRinf.MAC(phi_samples[0,0,2], val) for val in phi_bar[:, 2]] )
    MAC_44 = np.array( [BRinf.MAC(phi_samples[0,0,3], val) for val in phi_bar[:, 3]] )

    TMAC = (MAC_11 + MAC_22 + MAC_33 + MAC_44) / 4

    MF = np.array([BRinf.MF(phi_bar[i], omega_bar[i]) for i in range(N)])

    y = np.column_stack((omega_bar, MAC_11, MAC_22, MAC_33, MAC_44, TMAC, MF))

    # save features 
    np.savez("data/features.npz", y=y, H_vec=H_vec, delta_E_samples=delta_E_samples, states=states, prior_theta=prior_theta, prior_Importance=prior_Importance)

# load features from file
features_data = np.load("data/features.npz")
y = features_data["y"]
y_features = y.copy()

Simple Plots of the features

Simple histograms of the features

Simple statistics of the features

Here blow is there some box plots and an print of statistical information about the features for H = 0 and H = 1.

i = -1
val = 40
plt.figure(figsize=(8,6))
plt.boxplot([y[(H_vec == 0) & (y[:, -1] < val), i], y[(H_vec == 1) & (y[:, -1] < val), i]], labels=['H=0', 'H=1'])
plt.ylabel(feature_names[i])
plt.title(f'Box Plot of {feature_names[i]} by Class')
plt.legend()
plt.grid(True)
plt.show()

Feature 0:
H=0: mean=1.0436, std=0.0104, min=0.9959, max=1.0945, q50=1.0437
H=1: mean=0.8053, std=0.2347, min=-0.0169, max=1.0833, q50=0.8840

Feature 1:
H=0: mean=6.7223, std=0.0672, min=6.4095, max=7.0430, q50=6.7224
H=1: mean=5.2201, std=1.3743, min=-0.1110, max=6.9474, q50=5.6496

Feature 2:
H=0: mean=18.9788, std=0.1895, min=18.0946, max=19.8528, q50=18.9788
H=1: mean=14.7098, std=3.8378, min=-0.2724, max=19.6928, q50=15.8796

Feature 3:
H=0: mean=34.1965, std=0.3416, min=32.6052, max=35.7654, q50=34.1970
H=1: mean=26.7902, std=6.8505, min=-0.3451, max=35.5405, q50=28.9695

Feature 4:
H=0: mean=0.9807, std=0.0158, min=0.8069, max=1.0000, q50=0.9848
H=1: mean=0.9777, std=0.0223, min=0.0032, max=1.0000, q50=0.9832

Feature 5:
H=0: mean=0.9857, std=0.0117, min=0.8748, max=1.0000, q50=0.9887
H=1: mean=0.9706, std=0.0452, min=0.0001, max=1.0000, q50=0.9833

Feature 6:
H=0: mean=0.9867, std=0.0108, min=0.8813, max=1.0000, q50=0.9895
H=1: mean=0.9432, std=0.1037, min=0.0000, max=1.0000, q50=0.9791

Feature 7:
H=0: mean=0.9873, std=0.0103, min=0.8577, max=1.0000, q50=0.9900
H=1: mean=0.9283, std=0.1742, min=0.0000, max=1.0000, q50=0.9810

Feature 8:
H=0: mean=0.9851, std=0.0062, min=0.9289, max=0.9993, q50=0.9860
H=1: mean=0.9549, std=0.0757, min=0.1734, max=0.9990, q50=0.9795

Feature 9:
H=0: mean=1.5069, std=0.1807, min=1.0314, max=2.6736, q50=1.4828
H=1: mean=1369.4844, std=906094.1906, min=1.0594, max=641042347.3130, q50=2.0916

Feature selection (Step 3)

Note

I have chossen to add an domy variable there holdes no information about the damage states, this is add as the last feature in the feature vector.

Feature selection

  • “Mutual information” (Herlau et al., p. 89) (pdf): Measures dependency between feature and label
    • ´from sklearn.feature_selection import mutual_info_classif´
  • Statistical tests
    • t-test (binary classification) (ANOVA (multi-class))
  • Model-based feature importance
    • Random Forest feature importance
    • Lasso regression coefficients
  • Recursive Feature Elimination (RFE)
    • Train model
    • Remove least important feature
    • Repeat

Note

If I want can i do K-fold cross validation to evaluate the performance of the feature selection method, and to select the optimal number of features. This has some of the smae vibes as RFE, but it is more computational expensive if i have a lot of features or an lot of model parametes to optimize.

Principle component analysis (PCA)

Why is it bad for feature selection?

  • ignore the class labels
  • keeps features with high variance, but this is not always the components that hold the most information about the class labels.
  • may discard low-variance features that are actually very discriminative for the classification.

Note: PCA is a dimensionality reduction technique that transforms the original features into a new set of uncorrelated features (principal components) that capture the most variance in the data. However, it does not consider the class labels, which can lead to suboptimal feature selection for classification tasks.

Statistical tests

Focus on

  • t-test (binary classification) (ANOVA (multi-class))
  • ANOVA (multi-class)

ANOVA (difference between means)

We can use the ANOVA test to compare the means of the features between the two classes (H=0 and H=1) to see if there is a statistically significant difference between the mean values in the two classes.

t-test = ANOVA for binary classification

What is the null hypothesis for the t-test?

  • \(H_0\): The mean of the feature is the same for both classes (no difference between classes).
  • \(H_1\): The mean of the feature is different for the two classes (there is a difference between classes).

iamge form page 354 in the book form the introduction to statistics on DTU.

where:

  • n is the number of samples
  • k is the number of groups (classes)

F-value, Test statistic for ANOVA:

\[ F = \frac{\text{between-group variability}}{\text{within-group variability}} = \frac{SS /(k-1)}{SSE / (n-k)} \]

P-value:

  • The probability of observing an F-value as extreme as, or more extreme than, the one calculated is found in F-distribution under the null hypothesis. scipy.stats.f

Note

Here can the values for the ANOVA test be seen, for all data included in the model.

  • It can be seen that the Modal flexibility values have an Low F-value and a high p-value, this means that the mean of the Modal flexibility values for class 0 and class 1 are not significantly different.
    • This is do to the big variance in the Modal flexibility whit in the groups. this variance comes from the power teram of the natrual frequenci in the denominator of the Modal flexibility formula, where the natural frequencies have a big variance.

Note on P-value

The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming that the null hypothesis is true.

F-statistic: 
[5.13793338e+05 5.95003440e+05 6.16149884e+05 5.82021160e+05
 6.02140937e+03 5.27007496e+04 8.72770266e+04 5.70997350e+04
 7.88654425e+04 2.44733455e+00 9.39035713e-02]

p-value:
[0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.11772494 0.75927208]

Within group variance (SSE):
[6.60591705e+05 6.26957428e+05 6.18754020e+05 6.32102332e+05
 9.94014619e+05 9.49937482e+05 9.19728657e+05 9.45984432e+05
 9.26899514e+05 2.57651588e+15 9.99999906e+05]

here is the result form ANOVA only use features up to the 99.9 quantile of the modal flexibility.

  • here can it be seen that the variaction in mean values between class 0 and class 1 for the Modal flexibility is more significant.
Feature Importance (F-values):
 [5.17154402e+05 6.01069799e+05 6.22650334e+05 5.88019396e+05
 5.99569843e+03 5.27811385e+04 8.76988201e+04 5.67667389e+04
 7.87979781e+04 1.54496937e+06 9.76022482e-02]

Feature Importance (p-values):
 [0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.75472629]

Mutual information

How much does knowing this feature reduce uncertainty about the class?

“5.5.3 Mutual information” (Herlau et al., p. 89) (pdf)

Mutual Information

where:

  • \(I(X; Y)\) is the mutual information between feature \(X\) and class label \(Y\).
  • \(P(Y)\) is the probability of being in each class.
    • this is 0.5 given in the cration of the data set.
  • \(P(X, Y)\) is the joint probability of feature \(X\) and class label \(Y\).
  • \(P(X)\) is the probability of feature \(X\). -This distribution is pratically unknown, but it can be estimated from the data by the use of K-nearest neighbor (kNN) density estimation, Density Estimation
Feature Importance (Mutual Information):
 [0.459 0.529 0.538 0.533 0.004 0.045 0.1   0.093 0.113 0.693 0.   ]

What can we see?

\[ y = [\omega_1, ... , \omega_n, MAC_{1,1}, ... , MAC_{n,n}, TMAC, MF, dummy] \]

  • It can be seen that omega and MF is quite informative
  • it can alos be seen that the Mac values are lees informative
    • MAC 1,1 and 2,2 give an samller value of informaiton then
    • MAC 3.3 and 4.4 which are more informative.
    • TMAC is the most informative MAC value??
  • It is angine seen that the dummy variable is captured as holing no relevant informaiton about the class labels.

Model-based feature importance

  • Logistic regression coefficients
  • Random Forest feature importance for an Decision Tree model
  • Permutation importance (for any model)

Logistic regression coefficients

  • The features whit the higset absolute vights are the most important?

Code cell 1: L2 regularization (default)

code cell 1 uses L2 regularization, which penalizes large coefficients and in some cases can lead to better generalization performance.

Penalty: \[ \sum b_i^2 \]

where \(b_i\) are the coefficients of the logistic regression model.

from sklearn.linear_model import LogisticRegression

model_log = LogisticRegression(penalty='l2')
# model_log.fit(X, H_vec)
model_log.fit(X_new, H_vec_new)

importance = model_log.coef_
print("Feature Importance (Logistic Regression Coefficients):\n",  np.round(importance, 3))
Feature Importance (Logistic Regression Coefficients):
 [[ -6.097  -9.208 -10.349  -8.983   0.018   0.077   0.346   1.03    0.718
   14.023  -0.266]]

Code cell 2: L1 regularization (Lasso-style)

code cell 2 uses L1 regularization, which have an small punishment on large coefficients, but it can lead to sparse models where some coefficients are exactly zero, effectively performing feature selection.

Penalty: \[ \sum |b_i| \]

where \(b_i\) are the coefficients of the logistic regression model.

from sklearn.linear_model import LogisticRegression

# Logistic regression with L1 regularization
model_log = LogisticRegression(
    penalty='l1',
    solver='liblinear',
    C=1.0
)
# model_log.fit(X, H_vec)
model_log.fit(X_new, H_vec_new)

importance = model_log.coef_
print("Feature Importance (Logistic Regression Coefficients):\n",  np.round(importance, 3))
Feature Importance (Logistic Regression Coefficients):
 [[-0.046 -1.626 -1.089 -1.091  0.112  0.006  0.221  0.825 -0.235  0.
  -0.   ]]

Random Forest feature importance

“9.1 Classification trees” (Herlau et al., p. 154) (pdf)

The general idea is to find the decision rule that leads to the biggest purity gain. In the book, it is given as below.

where:

  • \(I()\) is the impurity measure (e.g., Gini impurity, entropy)
  • \(N(v_k)\) is the total number of samples of class \(k\) in branch \(v\)
  • \(N(r)\) is the number of samples in the root

here is the most common impurity measures: (sklearn uses Gini impurity)

  • \(p(c|v)\) is the proportion (fraction) of samples of class \(c\) in branch \(v\).

Random Forest

Random Forests is an technique where mulitiple decision trees are trained on different data sets build whit bootstrapping and random feature selection, and then the final prediction is made by averaging the predictions of the individual trees (for regression) or by majority voting (for classification). This technique helps to reduce overfitting and improve generalization performance compared to a single decision tree.

The total feature purity gain can then be computed by

\[ \text{Feature Importance}(X_j) = \sum_{\text{split on } X_j}^{T} \Delta \]

In the code is this then normalized whit the sum of all feature importance values.

\[ importance(X_j) = \frac{\text{Feature Importance}(X_j)}{\sum_{k=1}^{M} \text{Feature Importance}(X_k)} \]

code_run = True
code_run = False
if code_run:
    from sklearn.ensemble import RandomForestClassifier

    model_rf = RandomForestClassifier()
    # model_rf.fit(X, H_vec)
    model_rf.fit(X_new, H_vec_new)

    # Save ML model whit pickle
    import pickle
    with open("data/ML/random_forest_model.pkl", "wb") as f:
        pickle.dump(model_rf, f)
else:
    # Load ML model with pickle
    import pickle
    with open("data/ML/random_forest_model.pkl", "rb") as f:
        model_rf = pickle.load(f)

importance = model_rf.feature_importances_
print("Feature Importance (Random Forest):\n", np.round(importance, 3))
Feature Importance (Random Forest):
 [0.08  0.099 0.202 0.161 0.    0.001 0.01  0.001 0.009 0.436 0.   ]

Permutation importance

what does “sklearn.inspection.permutation_importance” do?

  • It computes the moldes preformane on the test set.
  • Then do it randomly permute/shuffle the values of a single feature.
  • Then it computes the model performance on the test set again with the permuted feature.

This is done for all features and the importance of each feature is then computed as the decrease in model performance when that feature is permuted.

Note on permutation importance

  • This gives some insight to how much the an specific desigin model is relying on an specific feature for making its predictions.

Results are only valid for the specific desigin model.

from sklearn.inspection import permutation_importance
result = permutation_importance(model_log, X_new, H_vec_new, n_repeats=4)

importance = result.importances_mean
print("Feature Importance (Permutation Importance):\n", np.round(importance, 3))
Feature Importance (Permutation Importance):
 [-0.001  0.144  0.099  0.094  0.001  0.     0.003  0.008  0.004  0.
 -0.   ]
# from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.inspection import permutation_importance
from sklearn.linear_model import LogisticRegression

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X_new, H_vec_new, random_state=42
)

# Train model
# model = RandomForestClassifier(random_state=42)
# model.fit(X_train, y_train)
model_log2 = LogisticRegression()
# model_log.fit(X, H_vec)
model_log2.fit(X_train, y_train)

# Compute permutation importance
result = permutation_importance(
    model_log2,
    X_test,
    y_test,
    n_repeats=10, # number of times to permute a feature
    random_state=42
)

# Print importance scores
print("Feature Importance (Permutation Importance):\n", np.round(result.importances_mean, 3))
Feature Importance (Permutation Importance):
 [ 0.044  0.074  0.083  0.071  0.     0.     0.     0.    -0.     0.323
  0.   ]

Recursive Feature Elimination (RFE)

what dos Recursive Feature Elimination (RFE) do?

  • It trains a model.
  • measure feature importance (Only some model claculate feature importance)
  • Remove the least important feature.
  • Repeat until the desired number of features is reached.
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

rfe = RFE(
    estimator=model,
    n_features_to_select=1
)

rfe.fit(X_train, y_train)
print("Feature Ranking (RFE):\n", rfe.ranking_)
Feature Ranking (RFE):
 [ 5  4  1  2 10 11  9  6  7  3  8]

Other Options for feature selection:

  • Do K-fold cross validation to evaluate the decision model performance for different subsets of features, and then select the subset that gives the best performance.

note: This is more computational expensive if i have a lot of features or an lot of model parametes to optimize.

Chsoen data split

I have chosen to use the reduced feature set of all of all omega value and then the modal flexibility.

# select the first 4 features and and MF that is number 8
selected_features = [0, 1, 2, 3, 8]
X_reduced = X[:, selected_features].copy()
X_reduced

# split the reduced data + matching delta_E_samples
X_train, X_test, y_train, y_test, delta_E_train, delta_E_test, states_train, states_test = train_test_split(
    X_reduced,
    H_vec,
    delta_E_samples,
    states_bar,
    test_size=0.2,
    random_state=42
)

# make the same 

Training of decision model (Step 4)

I have 2 options for building an decision rule:

  1. Fit an Density Estimator on my data an ude Bayes rule to make a decision.
    • Gaussian Mixture Model (GMM)
    • K-nearest neighbor (kNN) density estimation
    • Kernel Density Estimation (KDE)
  2. Train a Black box model to make a decision.
    • Logistic regression
    • Decision Tree
      • Random Forest
    • K-nearest neighbor (kNN) classification

Density Estimation

What is my options now?

  • Probabilistic models
    • Estimate \(P(y \mid H_i)\) from data (Simy supervised learning)
    • Estimate \(P(y \mid \theta_i)\) from data (Simy supervised learning)
    • Estimate \(P(\theta_{new} \mid y)\) from data (100% unsupervised learning)
      • \(\theta_{new}\) the new damage states that the unsupervised model constructs based on the features.

It is hard to estimate this probabilitys analytically, one because we do not kow the OMA error function??? and two if we did would we not get a nice Analytical form suliton for the modal flexibility, and the MAC values.

Density Estimation form data?

  • Gaussian Mixture Model (GMM)
  • K-nearest neighbor (kNN) density estimation
  • Kernel Density Estimation (KDE)

Gaussian Mixture Model (GMM) - Genral idea

A single Gaussian distribution is:

\[ \mathcal{N}(\mu, \Sigma) \]

where:

  • \(\mu\) = mean vector
  • \(\Sigma\) = covariance matrix

A GMM combines several Gaussians:

\[ p(x)=\sum_{k=1}^{K}\pi_k \mathcal{N}(x \mid \mu_k,\Sigma_k) \]

where:

  • \(K\) = number of clusters
  • \(\pi_k\) = cluster probability
  • \(\mu_k\) = cluster mean
  • \(\Sigma_k\) = covariance matrix

Covariance matrix for GMM in sklearn

covariance_type=‘full’:

  • Each cluster gets its own full covariance matrix.

covariance_type=‘diag’:

  • Each cluster gets its own diagonal covariance matrix.

covariance_type=‘spherical’:

  • Each cluster has one variance value. (clusters are circular)

covariance_type=‘tied’:

  • All clusters share one covariance matrix.

What have i done?

Here in the code below have i trained the GMM for 2 clusters, each whit there own full covariance matrix. this corespond to.

\[ P(\theta_{new} \mid x) \]

  • This is whith the assumption that the data can be well described by 2 Gaussian distributions, each with its own mean and covariance matrix.
  • \(x\) is the feature vector.

Note: This model makes 2 new classes but it is not equal to the classes being H=0 and H=1.

Baddd Generalization

  • This model hass learned the worng prior wights, because of the importance sampling the data set.
  • there for whut the priors need to be updated to what was expected in an real casse (GMM Weights)
GMM Converged: True
GMM Weights: [0.56634945 0.43365055]
GMM Means:
 [[ 0.57569601  0.60176465  0.60736899  0.59697979  0.27123454]
 [-0.75085048 -0.78528921 -0.79271114 -0.77919865 -0.35331963]]

data = X_test[:10**3, :]
gmm_labels = gmm.predict(data)
# print("GMM Cluster Assignments:\n", gmm_labels)

plt.figure(figsize=(8,6))
plt.scatter(data[:, 0], data[:, 1], c=gmm_labels, cmap='viridis', alpha=0.5)
plt.xlabel(r'$\omega_1$')
plt.ylabel(r'$\omega_2$')
plt.title('GMM Clustering of Features')
# plot the GMM means
plt.scatter(gmm.means_[:, 0], gmm.means_[:, 1], c='red', marker='X', s=200, label='GMM Means')
# PLOT the 10, 20 , 30 , 40 , 50, 60, 70, 80, 90% quantity contour of the GMM components
prob = gmm.predict_proba(data)
from scipy.stats import chi2
if gmm.covariance_type != 'tied':
    for i in range(gmm.n_components):
        mean = gmm.means_[i]
        cov = gmm.covariances_[i]
        # compute the eigenvalues and eigenvectors of the covariance matrix
        eigvals, eigvecs = np.linalg.eigh(cov)
        # sort the eigenvalues and eigenvectors in descending order
        order = np.argsort(eigvals)[::-1]
        eigvals = eigvals[order]
        eigvecs = eigvecs[:, order]
        # compute the angle of the ellipse
        angle = np.arctan2(eigvecs[1, 0], eigvecs[0, 0])
        # compute the width and height of the ellipse for different confidence levels
        for conf in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:
            chi2_val = chi2.ppf(conf, df=2)
            width = 2 * np.sqrt(eigvals[0] * chi2_val)
            height = 2 * np.sqrt(eigvals[1] * chi2_val)
            # plot the ellipse
            from matplotlib.patches import Ellipse
            ellipse = Ellipse(
                xy=mean,
                width=width,
                height=height,
                angle=np.degrees(angle),
                edgecolor='red',
                facecolor='none',
                linestyle='--',
                label=f'{int(conf*100)}% Confidence Ellipse' if i == 0 else None
            )
            plt.gca().add_patch(ellipse)
plt.legend()
plt.ylim(data[:, 1].min(), data[:, 1].max())
plt.xlim(data[:, 0].min(), data[:, 0].max())
plt.grid(True)
plt.show()

How to read an roc curve?

Receiver Operating Characteristic (ROC) curve, StatQuest:

  • y-axis: True Positive Rate (TPR) = \(\frac{TP}{TP + FN}\)
    • Number of true positives (TP) relative to the total number of actual positives (H=1).
  • x-axis: False Positive Rate (FPR) = \(\frac{FP}{FP + TN}\)
    • Number of false positives (FP) relative to the total number of actual negatives (H=0).
  • Area Under the Curve (AUC): A single scalar value that summarizes the overall performance of the classifier. AUC ranges from 0 to 1, where 1 indicates perfect classification and 0.5 indicates random guessing.

Note: Each point on the ROC curve corresponds to a different threshold for classifying a sample.

K-nearest neighbor (kNN) density estimation

Reference

General idea:

  1. find the distance to the k-th nearest neighbor. \(r_k(x)\)
  2. Compute the volume \(V(x)\)
  3. Estimate the density \(\hat{p}(x)\)

\[ \hat{p}(x) = \frac{k}{n V(x)} \]

where:

  • \(k\) is the number of nearest neighbors
  • \(n\) is the total number of samples
  • \(V(x) = c_d r_k(x)^d\) is the volume of the hypersphere defined by the distance to the k-th nearest neighbor.
    • \(c_d\) is a constant depending on the dimension \(d\)
    • \(r_k(x)\) is the distance from the point \(x\) to its k-th nearest neighbor.
    • \(d\) is the dimensionality of the feature space.

What have i done?

I split the Training data up into 2 classes H=0 and H=1, and then i trained a kNN density estimator for each class. this corespond to.

\[ P(x \mid H) = \frac{k}{n V(x)}, \quad \text{where } x = H, \quad H \in \{0, 1\} \]

  • \(x\) is the feature vector.

Note: this is the liklihood function that i can use in Bayes rule to compute the posterior distribution \(P(H \mid x)\)

knn_density = KNN_density_estimator(n_neighbors=20)
knn_density.P_H1_given_X_train(X_train[y_train == 1])
knn_density.P_H0_given_X_train(X_train[y_train == 0])

p_H1 = knn_density.P_H1_given_X(X_test)
p_H0 = knn_density.P_H0_given_X(X_test)
Training done with KNN density estimator with 400564 samples and 5 features
Training done with KNN density estimator with 399436 samples and 5 features

Bayesian rule for decision making

Use the law of total probability: \[ P(H \mid x) \propto P(x \mid H) P(H) \]

Use an index of 0.5 to make a decision: \[ d = \begin{cases} 1 & \text{if } P(H=1 \mid x) > 0.5 \\ 0 & \text{otherwise} \end{cases} \]

# Bayesian decision rule
knn_prior = np.array([0.5, 0.5]).T # prior probabilities for H=0 and H=1

posterior = np.multiply(knn_prior, np.array([p_H0, p_H1]).T) # unnormalized posterior probabilities
posterior = posterior / posterior.sum(axis=1, keepdims=True) # normalize to get proper probabilities

# classify based on maximum posterior probability
knn_predictions = np.argmax(posterior, axis=1)

This Histogram is somewhat misleading

This histogram is misleading because i have the same number of bines for both casses, i would be better to use some scaling rull like the one in the prob-book bassed on the number of samples

Kernel Density Estimation (KDE)

Note don !!

Analysis 1

  • Evaluation method: Monte Carlo simulation with importance sampling
  • likelihood: Explicit likelihood for Modal properties (no use of features)
  • Decision rule: Maximum a posteriori (MAP) decision rule \(d(y) = \arg\max_{H} p(H \mid y)\)

New stuff:

  • Continuous damage reduction factor: \(\delta E\)
    • \(P(\delta E \mid \theta)\): Probability of level of stiffness reduction
  • \(P(\text{failure} \mid \delta E, \theta)\): Probability of failure given stiffness reduction and damage state
  • Updated cost function: Expected cost given damage

Test

Likelihood evaluation P(y | theta)

Main approaches to build the full distribution

Method One-time cost Reusable Accuracy
Monte Carlo per y cheap high
Quadrature medium high
Sampling + KDE expensive high
Gaussian mixture expensive very high

Grid evaluation (slow to build, but fast to evaluate)

✅ Pros:

  • simple
  • robust

❌ Cons:

  • expensive to build
  • resolution limited

Build density from y samples? (fast to build, but slow to evaluate)

  • generate y samples
  • build density from the samples (e.g., kernel density estimation, histogram, etc.)

✅ Pros:

  • very flexible
  • works perfectly with black-box g
  • reusable

❌ Cons:

  • KDE smoothing error

Mixture approximation (simple and precise, but slow to evaluate)

Note

  • This is what i have done with MC sampling earlier

Surrogate model of g(x)

  1. Approximate g(x) with a smooth model
  2. Use fast integration afterward

case study Overview

Chosen Features

Modal properties:

  • x first natural frequencies
  • x first mode shapes

Features:

  • natural frequencies
  • Total Modal Assurance Criterion (TMAC)
    • \(MAC(\phi_a,\phi_b)=\frac{|\phi_a^T\phi_b|^2}{(\phi_a^T\phi_a)(\phi_b^T\phi_b)}\)
    • with no damage as reference \(H_0\)
  • Modal Flexibility
    • \(F = \sum_{i=1}^{n} \frac{\phi_i \phi_i^T}{\omega_i^2}\)

Evaluation of the Bayes risk

Numarical integration:

  • MCS with importance sampling

Decision rules:

  • Explicit likelihood for Modal properties (no use of features)
    • Maximum a posteriori (MAP) decision rule \(\delta(y) = \arg\max_{a} p(H \mid y)\)
  • Supervised Machine learning (ML) classification based on the features (no use of likelihood)
    • Logistic regression

Prior distibution

H_1 = 0.01
H_0 = 1-H_1

# uninformed prior
N_prior = 2**4
prior_H = np.array([H_0, H_1])
prior_theta = np.concatenate(([H_0], np.repeat(H_1 / (N_prior - 1), N_prior - 1)))

print(f"Prior probabilities for H: {prior_H}")
print(f"\nPrior probabilities for θ: {np.round(prior_theta, 4)}")
Prior probabilities for H: [0.99 0.01]

Prior probabilities for θ: [9.9e-01 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04
 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04 7.0e-04]

Cost function

\[ C_\text{total} = C_\text{prediction}(H, d) + C_\text{sensors}(e) \]

\[ C_\text{sensors}(e) = 5 + 1 \cdot \text{number of sensors} \]

\[ C_\text{prediction} = \begin{bmatrix} 0 & 50 \\ 10000 & 10 \end{bmatrix} = \begin{bmatrix} \text{True Negative} & \text{False Positive} \\ \text{False Negative} & \text{True Positive} \end{bmatrix} \]

where:

  • Positive: is the prediction of damage.
  • Negative: is the prediction of no damage.
  • True Positive (TP): Correctly predicting damage when it is present.
  • False Positive (FP): Incorrectly predicting damage when it is not present.
  • True Negative (TN): Correctly predicting no damage when it is not present.
  • False Negative (FN): Incorrectly predicting no damage when there is actually damage.
# C [H, d] = the prediction cost
prediction_cost = np.array([[0,   50],  # TN, FP
                            [10000, 100]   # FN, TP
                            ])  

H = 1; d = 1 
print(f"Prediction cost: {prediction_cost[H, d]}, which is the cost of predicting {d} when the true state is {H}.")

def sensor_cost(n_sensors):
    return 5 + 1 * n_sensors
Prediction cost: 100, which is the cost of predicting 1 when the true state is 1.