ADIOS - I-NERGY- Label Extend
This container provide the services for label extension on a tagged dataset using a voting algorithm.
The label extension mechanism uses similarity between alarms to associate each unknown alarm with its most similar known one. We pick a reduced portion of the overall dataset (50k alarms) to extend the training set. The features of the dataset are mainly string fields, except for the Priority file, which is numerical. The similarity between each two alarms is measured in terms of the number of different features that they present.
The adios_label_extend.py script accepts two arguments, first is input_file which is the labeled data set to be extended, second is output_file which is the name of the produced extended labeled dataset. If none of the arguments are passed, defaults will be used (tagged_10k.csv, tagged_10k_extended.csv).
The module first reads the source file into memory and then calls extend_labeled_dataset service. The service executes ML script and saves the results into a csv file which will be passed to the next node using a shared folder.
After the output file is saved, the gRPC server is started ready for sending the generated file name.
Protobuf contains one type which is File and which contains file name which will be the result of output of the module. Service LabelExtendService has one method get_file which returns the above described file.
The docker contains a label extending system with 3 different algorithms connected with a voting system mechanism.
Label Propagation
Label Spreading