Commit 5bf79bd0 authored by Carlos A. Iglesias's avatar Carlos A. Iglesias
Browse files

Cambiado mosaik-demo a securegrid-simulator

parent 19cd9933
syntax: glob
*.egg-info/
*.pyc
*.ropeproject
*~
.DS_Store
.coverage
MANIFEST
dist/*
docs/_build/*
htmlcov/*
venv/
/demo.hdf5
/data/profiles.data
{
"cells": [],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}
This diff is collapsed.
# SecureGrid
Deep Learning based Attack Detection System for Smart Grids
## Required modules
* Keras
* There is a recognized bug in numpy 1.16.3. Thus, the numpy version should be 1.16.1.
pip uninstall numpy
pip install --upgrade numpy==1.16.1
# Mosaik
For running the demo number of packages are needed:
sudo apt-get install git python3-numpy python3-scipy python3-h5py
In addition, there is a bug in arrow in mosaik demo, so you should install the version arrow 0.14:
pip install arrow==0.14
Execute
python securegrid-demo.py
After finishing, you will have the simulation data in the file demo.hdf5
You can visualize this file with any hdf5 viewer.
In our case, we are using ViTables (http://vitables.org/Download/).
If you desire to install it, follow the installation instructions.
The suggested process is:
apt install libhdf5-dev
pip install pyqt5
pip install vitables
Then execute 'vitables' in a terminal.
If you wish to visualize the scenario, you can install maverig: https://bitbucket.org/mosaik/maverig/src/master/. Basically
pip install maverig
## Usage
In order to detect attacks, the power consumption values of the houses are analyzed. For that reason, first, the needed DataFrames to feed the neural network (autoencoder) have to be created.
For this purpose, the notebook dataframe_creation is used. This notebook generates .pkl files that contain the DataFrames with the necessary data. In addition, these DataFrames contain the following features:
| Feature | Description |
| ------------- | ------------- |
| Day | Current day of the first window value |
| Hour | Current hour of the first window value |
| Minute | Current minute of the first window value |
| Pn | Power consumption window values |
| Mean | Mean of the window values |
| Mean_i - Mean_i-1 | Difference between the mean of the window values and the mean of the previous window values |
| s | Standard deviation of the window values |
| Pn - P1 | Difference between the last and first value of the window |
| Q1 | First quartile of the window values |
| Q2 | Median of the window values |
| Q3 | Third quartile of the window values |
| IQR | Interquartile range of the window values |
For executing the dataframe, it is needed to install the package h5py
pip install h5py
Once the DataFrames are created, they are used to feed the autoencoder. Therefore, the conv1d_autoencoder.py file has to be configured.
The normal_data_path variable has to contain the path to the .pkl file that contains data without attacks, that is to say, a normal behaviour of the houses. In addition, the attack_data_path variable has to contain the path to the .pkl file that contains the data that is wanted to be analyzed in order to detect attacks.
Furthermore, in order to train the autoencoder, the DO_TRAINING variable has to be set to True.
Finally, the following command executes the system:
$ python conv1d_autoencoder.py
## Results
Once the system is executed, it generates the predicted_labels.csv file that contains the labels that classify every entry of the DataFrame into attack (1) or normal behaviour (0).
import mosaik_api
import random
meta = {
'models': {
'Attack': {
'public': True,
'params': ['target_attr'],
'attrs': ['P_out_val'],
},
},
}
attackPercentageValue = 0
attackTime = 5
class Attack(mosaik_api.Simulator):
def __init__(self):
super().__init__(meta)
self.units = {}
def init(self, sid, step_size=15*60):
self.sid = sid
self.step_size = step_size
return self.meta
def create(self, num, model, **model_params):
n_units = len(self.units)
entities = []
for i in range(n_units, n_units + num):
eid = 'Attack-%d' % i
self.units[eid] = model_params
entities.append({'eid': eid, 'type': model })
return entities
def step(self, time, inputs):
commands = {}
progress = yield self.mosaik.get_progress()
if progress >attackTime:
for eid, attrs in inputs.items():
# measure = 0
for attr, vals in attrs.items():
if attr == 'P_out_val':
for src_id, val in vals.items():
target_id = src_id
values = val
if eid not in commands:
commands[eid] = {}
target_attr = self.units[eid]['target_attr']
if target_id not in commands[eid]:
commands[eid][target_id] = {}
commands[eid][target_id][target_attr] = attackPercentageValue
# print("COMMANDS", commands)
yield self.mosaik.set_data(commands)
return time + self.step_size
def main(self):
return mosaik_api.start_simulation(Attack(), 'example attack')
if __name__ == '__main__':
main()
import pandas as pd
import numpy as np
import operator
import csv
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error
# from keras_anomaly_detection.library.plot_utils import visualize_reconstruction_error
from keras_anomaly_detection.library.convolutional import Conv1DAutoEncoder
DO_TRAINING = False
def main():
data_dir_path = './data'
model_dir_path = './models'
normal_data_path = '/normal/houses_concatenated.pkl'
attack_data_path = '/anomaly_20/labels_df/house_0.pkl'
#Read Normal Data (No Attacks)
houses = pd.read_pickle(data_dir_path + normal_data_path)
print(houses.head())
houses = houses.drop("attacked", axis=1)
houses_m = houses.as_matrix()
scaler = MinMaxScaler()
houses_m_n = scaler.fit_transform(houses_m)
# scaled_anomaly_data = scaler.transform(anomaly_data)
train, test = train_test_split(houses_m_n, test_size=0.3)
ae = Conv1DAutoEncoder()
# fit the data and save model into model_dir_path
if DO_TRAINING:
ae.fit(train, model_dir_path=model_dir_path)
# load back the model saved in model_dir_path detect anomaly
ae.load_model(model_dir_path)
predicted_train = ae.predict_mine(train)
mae_train = mae(train, predicted_train)
rsme_train = rsme(train, predicted_train)
print("MAE_TRAIN", mae_train, "MEAN", np.mean(mae_train))
print("RSME_TRAIN", rsme_train, "MEAN", np.mean(rsme_train))
predicted_test = ae.predict_mine(test)
mae_test = mae(test, predicted_test)
rsme_test = rsme(test, predicted_test)
print("MAE_TEST", mae_test, "MEAN", np.mean(mae_test))
print("RSME_NORMAL", rsme_test, "MEAN", np.mean(rsme_test))
print("NORMALIZED TEST MAE", np.mean(np.divide(mae_test, mae_train)))
print("NORMALIZED TEST RSME", np.mean(np.divide(rsme_test, rsme_train)))
final_fs = []
df_house_normal = pd.read_pickle(data_dir_path + normal_data_path)
labels_normal = df_house_normal["attacked"]
df_house_normal = df_house_normal.drop("attacked", axis=1)
scaled_house_normal = scaler.transform(df_house_normal)
# Read Attack Data
df_house_anomaly = pd.read_pickle(data_dir_path + attack_data_path)
labels_anomaly = df_house_anomaly["attacked"]
df_house_anomaly = df_house_anomaly.drop("attacked", axis=1)
scaled_house_anomaly = scaler.transform(df_house_anomaly)
normal_rsme = []
for i in range(len(houses_m_n)):
out_normal = ae.predict_mine(houses_m_n[i:i+1])
rsme_normal = np.sqrt(mean_squared_error(houses_m_n[i:i+1], out_normal, multioutput='raw_values'))
rsme_normal_mean = np.mean(rsme_normal)
normal_rsme.append(rsme_normal_mean)
anomaly_rsme = []
for i in range(len(scaled_house_anomaly)):
out_anomaly = ae.predict_mine(scaled_house_anomaly[i:i+1])
rsme_anomaly = np.sqrt(mean_squared_error(scaled_house_anomaly[i:i+1], out_anomaly, multioutput='raw_values'))
rsme_anomaly_mean = np.mean(rsme_anomaly)
anomaly_rsme.append(rsme_anomaly_mean)
normalized_rsme = np.divide(anomaly_rsme, np.mean(normal_rsme))
plt.plot(normalized_rsme, marker='o', linestyle='', ms=3.5)
plt.show()
final_labels = window_error(normalized_rsme, labels_anomaly)
outfile = open('predicted_labels.csv','w')
out = csv.writer(outfile)
out.writerows(map(lambda x: [x], final_labels))
outfile.close()
print("FINAL Labels", final_labels)
print("CLASSIFICATION REPORT",metrics.classification_report(labels_anomaly, final_labels) )
def get_th_opt(normalized_rsme, labels_anomaly, flag):
j=2
predicted_labels = {}
labels = labels_anomaly
f_scores_th = {}
prueba = {}
while j < 30:
predicted_labels[j] = []
for i in normalized_rsme:
if i >= j:
predicted_labels[j].append(1)
else:
predicted_labels[j].append(0)
f_scores_th[j] = metrics.f1_score(labels, predicted_labels[j], average='macro')
j = round(j + 0.1, 2)
f1_max, th_opt = max(zip(f_scores_th.values(), f_scores_th.keys()))
if flag:
f1_good, f_preds = filter_error(labels_anomaly, predicted_labels[th_opt])
return f1_good, f_preds , th_opt
return f1_max , th_opt
def window_error(normalized_rsme, labels_anomaly):
window_len = 25
f1s = {}
for i in range(0,window_len):
window_arr = []
for j in range(0, len(normalized_rsme)):
if j < i:
s = np.sum(normalized_rsme[0:j+1])
else:
s = np.sum(normalized_rsme[j-i : j])
window_arr.append(s)
f1_max, preds, th_opt = get_th_opt(window_arr, labels_anomaly, True)
f1s[f1_max] = preds
print("F1MAX", max(f1s.items(), key=operator.itemgetter(0))[1])
return max(f1s.items(), key=operator.itemgetter(0))[1]
def filter_error(anomaly, predicted):
win = 50
fs = {}
for j in range(30, win):
final_predictions = []
for i in range(0, len(predicted)):
if i < j:
final_predictions.append(predicted[i])
else:
addition = np.sum(predicted[i-j : i])
if addition > 0.2*j:
final_predictions.append(1)
else:
final_predictions.append(0)
f1 = metrics.f1_score(anomaly, final_predictions, average='macro')
# print("F1", f1)
fs[f1] = final_predictions
f1_good = max(fs.items(), key=operator.itemgetter(0))[0]
f_preds = max(fs.items(), key=operator.itemgetter(0))[1]
return f1_good, f_preds
def anomaly_houses_4():
# Gets the graph with the error of all the houses
print("------------NORMAL-MAE---------------------------------------")
for i in range(0,38):
df_house_anomaly = pd.read_csv(data_dir_path + '/normal_f_csv/house_{}_row.csv'.format(i), header=None)
scaled_house_anomaly = scaler.transform(df_house_anomaly)
out_anomaly = ae.predict_mine(scaled_house_anomaly)
mae_anomaly = mean_absolute_error(scaled_house_anomaly, out_anomaly, multioutput='raw_values')
n_mae_anomaly = np.divide(mae_anomaly, mae_train)
print(i, np.mean(mae_anomaly), np.mean(n_mae_anomaly))
print("-------------NORMAL-RSME--------------------------------------")
for i in range(0,38):
df_house_anomaly = pd.read_csv(data_dir_path + '/normal_f_csv/house_{}_row.csv'.format(i), header=None)
scaled_house_anomaly = scaler.transform(df_house_anomaly)
out_anomaly = ae.predict_mine(scaled_house_anomaly)
rsme_anomaly = np.sqrt(mean_squared_error(scaled_house_anomaly, out_anomaly, multioutput='raw_values'))
n_rsme_anomaly = np.divide(rsme_anomaly, rsme_train)
print(i, np.mean(rsme_anomaly), np.mean(n_rsme_anomaly))
print("------------ANOMALY-MAE---------------------------------------")
mae_values = []
for i in range(0,38):
df_house_anomaly = pd.read_csv(data_dir_path + '/anomaly_4_f_csv/house_{}_anomaly.csv'.format(i), header=None)
scaled_house_anomaly = scaler.transform(df_house_anomaly)
out_anomaly = ae.predict_mine(scaled_house_anomaly)
mae_anomaly = mean_absolute_error(scaled_house_anomaly, out_anomaly, multioutput='raw_values')
n_mae_anomaly = np.divide(mae_anomaly, mae_train)
mae_values.append(np.mean(n_mae_anomaly))
print(i, np.mean(mae_anomaly), np.mean(n_mae_anomaly))
visualize_reconstruction_error(mae_values, 2)
print("-------------ANOMALY-RSME--------------------------------------")
rsme_values = []
for i in range(0,38):
df_house_anomaly = pd.read_csv(data_dir_path + '/anomaly_4_f_csv/house_{}_anomaly.csv'.format(i), header=None)
scaled_house_anomaly = scaler.transform(df_house_anomaly)
out_anomaly = ae.predict_mine(scaled_house_anomaly)
rsme_anomaly = np.sqrt(mean_squared_error(scaled_house_anomaly, out_anomaly, multioutput='raw_values'))
n_rsme_anomaly = np.divide(rsme_anomaly, rsme_train)
rsme_values.append(np.mean(n_rsme_anomaly))
print(i, np.mean(rsme_anomaly), np.mean(n_rsme_anomaly))
visualize_reconstruction_error(rsme_values, 2)
def mae(original, predicted):
return mean_absolute_error(original, predicted, multioutput='raw_values')
def rsme(original, predicted):
return np.sqrt(mean_squared_error(original, predicted, multioutput='raw_values'))
if __name__ == '__main__':
main()
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment