Text Classification
Safetensors
Transformers
English
LogClassifier
bert
log-classification
log feature
log-similarity
AIOps
Instructions to use rahulm-selector/log-classifier-BERT-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rahulm-selector/log-classifier-BERT-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="rahulm-selector/log-classifier-BERT-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("rahulm-selector/log-classifier-BERT-v1") model = AutoModelForSequenceClassification.from_pretrained("rahulm-selector/log-classifier-BERT-v1") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| library_name: LogClassifier | |
| tags: | |
| - log-classification | |
| - log feature | |
| - log-similarity | |
| - transformers | |
| - AIOps | |
| pipeline_tag: text-classification | |
| # log-classifier-BERT-v1 | |
| This model is a transformers classification model, trained using BERTForSequenceClassification designed for use in network and device log mining tasks. | |
| Developed by [Selector AI](https://www.selector.ai/) | |
| ## Model Usage | |
| ```python | |
| from transformers import BertForSequenceClassification, BertTokenizer | |
| # Step 1: Load the model and tokenizer from Hugging Face | |
| model = BertForSequenceClassification.from_pretrained("rahulm-selector/log-classifier-BERT-v1") | |
| tokenizer = BertTokenizer.from_pretrained("rahulm-selector/log-classifier-BERT-v1") | |
| import torch | |
| model.eval() | |
| # Step 2: Prepare the input data (Example log text) | |
| log_text = "Error occurred while accessing the database." | |
| # Tokenize the input data | |
| inputs = tokenizer(log_text, return_tensors="pt", padding=True, truncation=True, max_length=128) | |
| # Step 3: Make predictions | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| # Step 4: Get the predicted class (the class with the highest score) | |
| predicted_class = torch.argmax(logits, dim=1).item() | |
| # label mapping (can load from JSON file in repo or config) | |
| label_mapping = model.config.id2label | |
| # Step 5: Get the event name | |
| predicted_event = label_mapping[predicted_class] | |
| print(f"Predicted Event: {predicted_event}") | |
| ``` | |
| ## Background | |
| The model focuses on structured and semi-structured log data, outputing around 60 different event categories. It is highly effective | |
| for real-time log analysis, anomaly detection, and operational monitoring, helping organizations manage | |
| large-scale network data by automatically classifying logs into predefined categories, facilitating faster | |
| and more accurate diagnosis of network issues. | |
| ## Intended uses | |
| Our model is intended to be used as classifier. Given an input text (a log coming from a network/device/router), it outputs a corresponding event most associated with the log. | |
| The possible events that can be classified are shown in [encoder-main.json](https://huggingface.co/rahulm-selector/log-classifier-BERT-v1/blob/main/encoder-main.json) | |
| ## Training Details | |
| ### Data | |
| The model was trained on a variety of network events and system logs, focusing on monitoring and analyzing state changes, | |
| protocol behaviors, and hardware interactions across infrastructure components. This included tracking routing issues, | |
| protocol neighbor state changes, link stability, and security events, ensuring that the model could recognize and | |
| classify critical patterns in device communications, network health, and configuration activities. | |
| ### Train/Test Split | |
| - **Train Data Size**: `~80K Logs` | |
| - **Test Data Size**: `~20K Logs` | |
| #### Hyper Parameters | |
| The following hyperparameters were used during training to optimize the model's performance: | |
| - **Batch Size**: `32` | |
| - **Learning Rate**: `.001` | |
| - **Optimizer**: `Adam` | |
| - **Epochs**: `10` | |
| - **Dropout Rate**: N/A | |
| - **LSTM Hidden Dimension**: `384` | |
| - **Embedding Dimension**: `384` | |