Note

Machine Learning - Gaussian Naive Bayes

Formula

According to Bayes’ theorem:

Using the naive conditional independence assumption that

for all , this relationship is simplified to

Since is constant given the input, we can use the following classification rule:

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of .

For Gaussian Naive Bayes, the likelihood of the features is assumed to be Gaussian:

Code

class GaussianNB:
    def __init__(self):
        pass

    def fit(self, X, y):
        n_features = X.shape[1]
        unique_y = np.unique(y)
        n_classes = unique_y.shape[0]

        self.mu = np.zeros((n_classes, n_features))
        self.var = np.zeros((n_classes, n_features))
        self.priors = np.zeros(n_classes)

        for y_i in unique_y:
            i = unique_y.searchsorted(y_i)
            X_i = X[y == y_i, :]
            self.mu[i, :] = np.mean(X_i, axis=0)
            self.var[i, :] = np.var(X_i, axis=0)
            self.priors[i] = float(len(X_i)) / len(X)
        return self

    def predict(self, X):
        n_samples = X.shape[0]
        y_pred = np.zeros(n_samples)
        for i in range(n_samples):
            density = (1.0 / np.sqrt(2 * np.pi * self.var)) * np.exp(-(((X[i] - self.mu) ** 2) / (2 * self.var)))
            prob_desity = np.multiply(np.multiply.reduce(density, axis=1), self.priors)
            y_pred[i] = np.argmax(prob_desity)
        return y_pred
machine-learning