A machine learning approach for fault detection of enery consumption values in a environment.
- MATLAB (https://www.mathworks.com/products/matlab.html).
- The Matworks database toolbox (https://www.mathworks.com/products/database.html).
- The Bayesian Network Toolbox (BNT) for MATLAB (https://github.com/bayesnet/bnt).
- Optional: a SQLite browser to browse the database, e.g., https://sqlitebrowser.org/.
Download the Bayesian Net Toolbox
(BNT) by typing:
make
on a UNIX terminal.
Open MATLAB, navigate to this working folder, and load the BNT library by running:
loadBNTlibrary
To generate the Bayesian Network(s)
dataAcquisition
To compute the Conditional Probability Tables (CPTs) of each node of the Bayesian Network and to compute the Full Joint Distribution of the Bayesian Network:
fullJointDistribution
The environment consists in a room of about 20 square meters and it is supposed to be used as a chill-out zone.
As depicted in the figure below, there is only one door to access the room. On the opposite side there is a wide window. In the bottom-right corner there are three electrical appliances: a microwave, a kettle, and a water dispenser.
Sofas are placed on the wall in the bottom and on the wall to the left, whereas coffee tables and chairs are present in the center of the room.
The room is equipped with the following sensors:
- an open/close sensor for the windows;
- a temperature and humidity sensor near the window;
- a temperature and movement sensor near the door;
- an energy consumption sensor, called Z-Plug, on which all the electrical appliances are plugged.
These sensors communicate with a central unit through (wireless) the ZigBee protocol and the data collected are saved in a SQLite database.
We use bayesian networks to represent dependencies among the data gathered by the four sensors introduced above in order to design bayesian inference procedures, such as different fault-detection systems.
Bayesian networks are data structures that map the relationship between events in terms of their probability. More specifically, a bayesian network is a direct acyclic graph where each node corresponds to a random variable (either descrete or continus) and each direct arc from node X to node Y means that X has a conditional dependency on Y. This conditional dependency is defined by a Conditional Probability Distribution (CPD) that, for each node, quantifies the effect of the parents on that node.
In case the random variable is discrete, the CPD can be represented as a Conditional Probability Table (CPT). The CPT lists the probability that a node takes on each of its different values for each combination of values of its parents.
The topology (structure) of the bayesian network and the parameters of each CPD can be both learned from data. However, since learning the structure is much harder than learning the parameters of the CPD, we have designed the topology of the network according to the method given in Chapter 14.2 of the book Artificial Intelligence: A Modern Approach.
The aim of our Bayesian Network is to model the environment (the room) by the data gathered from all the sensors.
We define six observable nodes that model the output of each sensor and one hidden node that models the possible presence of a person inside the room.
Number | Name | Modelled data |
---|---|---|
1 | Movement | Motion detection (binary) |
2 | Presence (Hidden Node) | Presence inside the room (binary) |
3 | WindowOpen | Window open (binary) |
4 | Z-Plug | Energy consumption (Watt) |
5 | TemperatureDoor | Temperature near the door (Celsius) |
6 | Humidity | Relative humidity (percent) |
7 | TemperatureWindow | Temperature near the window (Celsius) |
The picture below shows the structure of our bayesian network.
In order to perform statistical inference on a bayesian network, it is necessary to perform a learning phase based on the available data.
The learning function of BNT Toolbox needs a matrix with the samples of the collected data. Each row of the matrix represents a node, whereas each column a sample gathered from every sensor at a given point of time.
As the sensors in the room provide samples in various format and with a different timestamps, to collect data consistently we have performed the following steps:
-
We have defined a timeline T of timestamps based on when a new sample is gathered by the TemperatureDoor sensor. The sampling rate is 1 sample every 3 minutes. (We note that the choice of the specific sensor is completely arbitrary; it is important to fix an unique timeline and then adapt the samples of the other sensors to it.)
-
For each timestamp
t
in the timeline T defined at point 1., we have extracted the samples of the Humidity sensor, the TemperatureWindow sensor, and of the Z-Plug energy consumption sensor whose timestamps are the closest tot
. -
For each timestamp
t
in the timeline T defined at point 1., we have set the value of Movement and WindowOpen sensors to 1 if, in the last 3 minutes, the corresponding sensor has been activated at least once.
In the MATLAB terminal, run
dataAcquisition
Inside the script you can change the time period:
- earliest date: '2012-06-26 00:00:00'
- latest date: '2012-07-28 00:00:00'
In particular, the script calls the function
dataAcquisition/computeBnet.m
that creates a matrix [8 x #samples]
representing the Bayesian Network. Each row contains the values of a
node over time:
Row | Sensor | Unit |
---|---|---|
1 | movementDetected | binary |
2 | windowOpen | binary |
3 | KettleOn | binary |
4 | WaterDispenserOn | binary |
5 | MicrowaveOn | binary |
6 | TemperatureWindow | Celsius |
7 | RelativeHumidity | percent |
8 | TemperatureDoor | Celsius |
All the data is saved in the MATLAB formatted data file bNet_data.mat
.
To compute the Conditional Probability Tables (CPTs) of each node of the Bayesian Network and to compute the Full Joint Distribution of the Bayesian Network:
fullJointDistribution