![dataset-cover_5.PNG](dataset-cover_5.PNG)

Introduction

The shipping industry and modern factories face significant challenges in the maintenance and reliability of their equipment – whether marine engines or industrial machines – which are essential for safe and efficient operations. Unexpected failures can cause operational delays, increased repair costs, environmental risks and direct threats to the safety of operators and crew, as shown by recent studies (Marques & Brito, 2019).

In a vulnerable environment, extreme conditions and the complexity of systems increase component wear, increasing the likelihood of failures if not identified in time (Macnica DHW, 2023). At the same time, in industrial facilities, machines operate under intense conditions – with load variations, expose aggressive environments and high temperatures – which also accelerates the wear of components and compromises the continuity of production.

To mitigate these impacts in both sectors, failure prediction and the implementation of predictive maintenance become essential. This approach allows for scheduled interventions that reduce downtime, optimize operational costs and extend the useful life of equipment (Filtrovali, 2019).

Furthermore, the use of advanced technologies – such as IoT sensors, vibration analysis, thermography and artificial intelligence algorithms – has revolutionized the way data is collected and analyzed. These systems enable continuous monitoring of critical operational parameters, allowing anomalies to be detected at an early stage in both the naval and industrial industries. Thus, it is possible to act before catastrophic failures occur, ensuring safer operation, increasing asset reliability and promoting a significant cost reduction with corrective maintenance.

With the digitalization of the naval industry and the advancement of the Internet of Things (IoT), embedded sensors are widely used to monitor critical operational variables, such as:

Engine temperature
Vibration levels
Lube oil pressure
Fuel consumption
Level of mechanical wear
Occurrence of leaks

Analyzing this data allows you to predict failures before they occur, reducing downtime and optimizing maintenance planning.

Project Summary

Smart Engine AI is an artificial intelligence system that combines autonomous agents, machine learning and physical modeling to transform the monitoring and maintenance of industrial and marine engines. It uses a multi-agent approach to analyze sensor data in real time, predict failures and automate decisions, reducing operational costs and optimizing equipment reliability.

Autonomous Agents:

The solution employs a multi-agent architecture, with agents specialized in different tasks:

Real-Time Engine Data Monitor: Continuously collects, processes, and visualizes engine sensor data, providing real-time insights into engine health and performance.
Engine Failure Prediction Expert: Utilizes advanced machine learning models to analyze sensor data, detecting patterns that indicate potential failures before they occur.
Engine Performance Reporting Analyst: Transforms raw engine data into comprehensive reports, highlighting key performance metrics and actionable insights for optimization.
Engine Maintenance Advisor: Leverages Retrieval-Augmented Generation (RAG) techniques to access and analyze technical manuals, providing highly accurate maintenance recommendations based on real-time diagnostics and historical data.

These agents communicate and coordinate with each other, allowing dynamic adjustments to engine operation and anticipating problems before they become critical.

Technologies Used

LLMs (Large Language Models): Assist in analyzing and interpreting data to recognize patterns and provide maintenance recommendations.
Machine Learning Pipeline: Uses trained models to classify the state of motors and predict failures.
Advanced Sensor Simulation: As this is a theoretical study, we use mathematical functions to simulate combustion engine sensor parameters, ensuring the minimum representation of real operating conditions.
Report Automation: Agents generate statistical and visual reports on engine operation to support decision making.

Solution Structure

Data Collection and Processing: Sensors capture variables such as temperature, vibration, pressure, fuel consumption, wear and leaks, feeding the machine learning model.
Fault Prediction and Intelligent Analysis: Agents use machine learning model predictions about possible engine failures, detect anomalies and suggest corrective actions during operation aiming at performance and safe operation of the equipment.
Visualization and Decision Making: At the end of the operation, an automated report is generated that presents statistics and trend graphs on the conditions of the engines during the period of operation.

Expected Results

Cost Reduction: Minimizes emergency maintenance and unexpected failures.
Greater Reliability and Security: Anticipating problems reduces risks and increases operational security.
Operational Efficiency: Improves engine performance and fuel consumption.
Data-Based Strategic Decisions: Facilitates planning and allocation of maintenance resources.

--DIVIDER--```python # ========== Data Manipulations ========== import pandas as pd import numpy as np # ========== Visualization ========== import matplotlib.pyplot as plt import seaborn as sns # ========== Statistical ========== from scipy.stats import mannwhitneyu, chi2_contingency import scipy.stats as stats # ========== Machine Learning ========== from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from xgboost import XGBClassifier # ========== Data Preprocessing ========== from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.impute import SimpleImputer # ========== Data and Assessment Division ========== from sklearn.model_selection import train_test_split, cross_val_score from sklearn.metrics import classification_report, confusion_matrix # ========== File and Template Management ========== import joblib import os # ========== Automation and AI========== from crewai import Agent, Task, Crew import openai from langchain_openai import ChatOpenAI from crewai_tools import DOCXSearchTool # ========== Other Utilities ========== import datetime import time import random import json import re # ========== Environment Variable Management ========== from dotenv import load_dotenv # ========== Warning Suppression ========== import warnings warnings.filterwarnings("ignore", category = DeprecationWarning) warnings.filterwarnings("ignore", category = UserWarning) warnings.simplefilter(action='ignore', category = FutureWarning) ```--DIVIDER--## **1. Exploratory Data Analysis** ```python data = pd.read_csv(r'C:\Users\z004hn4c\Documents\Pump_Project\marine_engine_data_1.csv') display(data.head()) ``` | timestamp | engine_id | engine_temp | oil_pressure | fuel_consumption | vibration_level | rpm | engine_load | coolant_temp | exhaust_temp | running_period | fuel_consumption_per_hour | engine_type | fuel_type | manufacturer | failure_mode | severity | |------------|-----------|-------------|--------------|------------------|-----------------|------------|-------------|--------------|--------------|----------------|--------------------------|--------------------------|-----------|--------------|-------------------|------------------------| | 2023-01-01 | ENG_001 | 79.816406 | 7.049409 | 1000.000000 | 4.366612 | 1770.214578 | 42.472407 | 78.323108 | 450.0 | 49.741791 | 100.0 | 4-stroke High-Speed | Diesel | MAN B&W | No Failure | Normal | | 2023-01-08 | ENG_001 | 98.982068 | 8.000000 | 6308.623817 | 3.732792 | 1677.238238 | 77.042858 | 100.000000 | 450.0 | 94.351515 | 100.0 | 2-stroke Low-Speed | Diesel | Mitsubishi | Overheating | Critical | | 2023-01-15 | ENG_001 | 83.918153 | 8.000000 | 6444.402260 | 4.061372 | 1487.472085 | 63.919637 | 78.178337 | 450.0 | 120.095804 | 100.0 | 2-stroke Medium-Speed | Diesel | Caterpillar | Fuel Issues | Requires Maintenance | | 2023-01-22 | ENG_001 | 81.887081 | 7.601603 | 4439.946613 | 3.999554 | 1548.624692 | 55.919509 | 82.896344 | 450.0 | 122.321555 | 100.0 | 2-stroke Medium-Speed | Diesel | MAN B&W | No Failure | Normal | | 2023-01-29 | ENG_001 | 78.550429 | 6.233033 | 3146.234038 | 4.520559 | 1441.151499 | 29.361118 | 80.791150 | 450.0 | 111.978460 | 100.0 | 4-stroke High-Speed | Diesel | Wärtsilä | Mechanical Wear | Critical |--DIVIDER--```python # Checking the format of the columns: data.info() ``` - **Total Entries:** 5200 - **Total Columns:** 17 - **Memory Usage:** 690.8+ KB | # | Column | Dtype |Non-Null Count | |----|------------------------------|----------|-------------------| | 0 | timestamp | object | 5200 | | 1 | engine_id | object | 5200 | | 2 | engine_temp | float64 | 5200 | | 3 | oil_pressure | float64 | 5200 | | 4 | fuel_consumption | float64 | 5200 | | 5 | vibration_level | float64 | 5200 | | 6 | rpm | float64 | 5200 | | 7 | engine_load | float64 | 5200 | | 8 | coolant_temp | float64 | 5200 | | 9 | exhaust_temp | float64 | 5200 | | 10 | running_period | float64 | 5200 | | 11 | fuel_consumption_per_hour | float64 | 5200 | | 12 | engine_type | object | 5200 | | 13 | fuel_type | object | 5200 | | 14 | manufacturer | object | 5200 | | 15 | failure_mode | object | 5200 | | 16 | severity | object | 5200 | - **float64:** 10 columns - **object:** 7 columns --DIVIDER--As you can see we do not have empty columns in the dataset. ### **1.1 Analysis of Fault Distributions**

```python def plot_bar_with_counts(data: pd.DataFrame, column: str, color: str): """ Plots a bar chart with totals at the top of each bar for the chosen column. Parameters: - data (pd.DataFrame): DataFrame containing the data. - column (str): Name of the categorical column to be plotted. Return: - Displays a bar graph. """ # Count occurrences of each category in the chosen column value_counts = data[column].value_counts() # Create the bar chart plt.figure(figsize=(24, 10)) bars = plt.bar(value_counts.index, value_counts.values, color=color) # Add the values to the top of each bar for bar in bars: plt.text( bar.get_x() + bar.get_width() / 2, bar.get_height(), str(bar.get_height()), ha="center", va="bottom", fontsize=12, fontweight="bold" ) # Chart Settings plt.xlabel(column) plt.ylabel("Quantidade") plt.title(f"Distribuição de {column}") plt.xticks(rotation=45, ha="right") plt.grid(axis="y", linestyle="--", alpha=0.7) # View the chart plt.show() ``` --DIVIDER--```python plot_bar_with_counts(data, "failure_mode", color='red') ``` ![grafico_1.png](grafico_1.png)

1.1.1 The majority of records (41.6%) correspond to fault-free engines.

Due to the predominance of the No Failure class, a model trained on this dataset may develop a bias to predict this category with high frequency. If this imbalance is not adequately addressed, the model may exhibit high overall accuracy, but poor performance in detecting real faults.
If this imbalance is not adequately addressed, the model may exhibit high overall accuracy, but poor performance in detecting real faults.

1.1.2 Mechanical Wear (20.0%) and Fuel Issues (18.7%) are the most common failures.

Mechanical wear and fuel-related problems represent more than half of recorded failures, suggesting that these are the main risk factors in engine operation.
To be useful in predictive maintenance, the model needs to have high accuracy in these categories, ensuring reliable diagnoses.

1.1.3 Less common failures, such as Oil Leakage and Overheating, represent only 19.7% of cases.

Although less frequent, these failures can be critical for engine operation.
Overheating, for example, can lead to severe damage and catastrophic failure if not detected in time.

1.1.4 The dataset presents a significant imbalance between the classes

The model may have difficulty detecting less represented failures, favoring predictions for more frequent classes (No Failure, Mechanical Wear and Fuel Issues).
This behavior can compromise the reliability of the monitoring and predictive maintenance system.
To mitigate this bias, it is essential to apply class balancing techniques, such as Oversampling/Undersampling, assigning class weights and prioritizing metrics such as Recall, ensuring that critical failures are not underestimated.

When exploring the next categorical variables, the distribution of failures will be included to assess whether any category has a higher failure rate or to identify possible patterns. For this, the function plot_bar_with_count will be replaced with plot_bar_with_hue:

--DIVIDER--```python def plot_bar_with_hue(data: pd.DataFrame, column: str, hue: str, palette: str = "Set2"): """ Plots a bar chart with totals at the top of each bar, segmented by a hue category. Parameters: - data (pd.DataFrame): DataFrame containing the data. - column (str): Name of the categorical column to be plotted. - hue (str): Column used for segmentation (e.g., "failure_mode"). - palette (str): Color palette for different categories (default: "Set2"). Returns: - Displays a bar graph. """ plt.figure(figsize=(16, 8)) # Create a complete DataFrame with all possible column and hue combinations all_combinations = pd.MultiIndex.from_product( [data[column].unique(), data[hue].unique()], names=[column, hue] ) counts = data.groupby([column, hue]).size().reindex(all_combinations, fill_value=0).reset_index(name="count") # Create the bar chart ax = sns.barplot(data=counts, x=column, y="count", hue=hue, palette=palette) # Add values to the top of each bar (without displaying zeros) for p in ax.patches: if p.get_height() > 0: # Only display non-zero values ax.annotate( format(p.get_height(), ".0f"), (p.get_x() + p.get_width() / 2., p.get_height()), ha="center", va="bottom", fontsize=12, fontweight="bold" ) # Chart settings plt.xlabel(column) plt.ylabel("Quantity") plt.title(f"Distribution of {column} by {hue}") plt.xticks(rotation=45, ha="right") plt.grid(axis="y", linestyle="--", alpha=0.7) plt.legend(title=hue) # Display the graph plt.show() ```--DIVIDER--### **1.2 Analysis of Maintenance Status Distributions**

```python plot_bar_with_hue(data=data, column="severity", hue="failure_mode") ``` ![grafico_2.png](grafico_2.png) **Severity Distribution Analysis**

- **Most records are classified as `Normal` (41.6%)** - The majority of equipment is in **normal operating condition**, with no recorded failures (**2165 occurrences**). - This suggests that most of the time, the equipment functions correctly. - **Critical failures are primarily caused by mechanical wear (32.4%)** - **Mechanical Wear** is the **leading cause of critical failures**, accounting for **1042 occurrences**. - This highlights the importance of **preventive maintenance** to mitigate wear-related breakdowns. - **Overheating is the second most common cause of critical failures** - **Overheating** appears in **644 cases**, making it a significant failure mode. - This suggests that **many engines may have cooling system inefficiencies**, which require further investigation. - **The `Requires Maintenance` category is strongly linked to fuel and oil leak issues (26.0%)** - **Fuel Issues** account for **971 records**, indicating a **high impact on maintenance needs**. - **Oil Leakage** is less frequent (**378 occurrences**) but remains an important factor. - **Failure patterns based on severity:** - **Critical Failures** → Mainly associated with **mechanical wear** and **overheating**. - **Requires Maintenance** → Primarily linked to **fuel system problems** and **oil leaks**. - **Class imbalance consideration:** - The **Normal** category dominates the dataset, which might lead predictive models to favor this class. - The **Critical** and **Requires Maintenance** categories, while significant, may require **rebalancing techniques** (e.g., oversampling, weighting) to improve model performance. - **Preventive maintenance strategies:** - **Mechanical Wear and Overheating** should be prioritized in **critical failure prevention plans**. - **Fuel system checks and oil leakage inspections** should be reinforced to reduce maintenance needs. --DIVIDER--### **1.3 Analysis of Fuel Type Distributions** ```python plot_bar_with_hue(data=data, column="fuel_type", hue="failure_mode") ``` ![grafico_6.png](grafico_6.png) ```python plot_bar_with_counts(data, "fuel_type", color='green') ``` ![grafico_3.png](grafico_3.png) **Fuel Type Distribution Analysis** **1.3.1 Higher incidence of diesel failures** - The total number of failures in diesel engines (1797) is greater than that in HFO engines (1238). - This may indicate that diesel engines are more prone to failures or that there are a greater number of them in the analyzed fleet. **1.3.2 Mechanical Wear as a Major Concern** - Mechanical wear is the most common failure mode in both Diesel (612 cases) and HFO (430 cases) engines. - This suggests that maintenance efforts should focus on reducing mechanical wear-related issues. **1.3.3 Fewer failures recorded in HFO engines** - The `No Failure` category has almost the same proportion in HFO engines (41.5%) than in Diesel engines (41.7%). - This may indicate that HFO and Diesel engines have a similar reliability rating and/or a common operating profile. **1.3.4 Specific differences between faults** - Fuel Issues are more common in `Diesel` engines (606 vs. 365 in `HFO`), possibly due to differences in fuel quality or injection systems. - `Oil Leakage` and `Overheating` follow a similar pattern between the two types of fuel, but are still more frequent in Diesel.--DIVIDER--### **1.4 Analysis of Engine Type Distributions** The distribution of failures by engine type reveals important information about engine behavior and the frequency of failures associated with each type. Below, we can highlight some key points from this analysis: ```python plot_bar_with_hue(data=data, column="engine_type", hue="failure_mode") ``` ![grafico_4.png](grafico_4.png) ```python plot_bar_with_counts(data, "engine_type",color='gray') ``` ![grafico_5.png](grafico_5.png) **Engine Type Analysis** **1.4.1 Most Engine Types** - **4-stroke High-Speed** and **2-stroke Medium-Speed** engines are predominant in the dataset, with 1548 and 1529 records, respectively, totaling the majority of the data. This suggests that these types of engines are more common and represent a significant part of the dataset. - **Fault distribution:** - **4-stroke High-Speed** and **2-stroke Medium-Speed** have the highest amounts of failures associated with various causes, such as **No Failure**, **Mechanical Wear**, **Fuel Issues**, and **Overheating**. - The **4-stroke High-Speed engine**, for example, has a total of 1548 occurrences, of which: - **679** are **No Failure** - **292** are **Mechanical Wear** - **281** are **Fuel Issues** - **188** are **Overheating** - These engines are more susceptible to **mechanical wear failures** and **fuel issues**, which can be expected due to the more intense use of high-performance engines. **1.4.2 Smallest Representation of Low-Speed Motors** - **2-stroke Low-Speed** has the lowest number of records, with **796** occurrences in total. This indicates that this type of engine is less common, which may reflect its more specialized and specific application, generally in industrial contexts or in large engines. - **Fault distribution:** - Failures associated with the **2-stroke Low-Speed engine** are mainly **No Failure** (**343 records**), but also include: - **Mechanical Wear** (**153 records**) - **Fuel Issues** (**153 records**) - **Overheating** (**92 records**) - The predominance of **No Failure** in this type of engine may indicate that, despite being less common, low-speed engines are robust and have a lower failure rate. **1.4.3 Failure Analysis by Engine Type** - **Common faults:** - **Mechanical Wear** is a significant failure for **4-stroke High-Speed** and **2-stroke Medium-Speed** engines, with **292** and **325** occurrences, respectively. - This suggests that, even though they are high-performance engines, **mechanical wear** is still one of the main problems, probably due to the **constant use and high operating load** of these engines. - **Fuel Issues** are also a recurring failure, mainly in: - **4-stroke High-Speed** (**281 failures**) - **2-stroke Medium-Speed** (**297 failures**) - This may reflect fuel-related issues such as **contamination** or **injection system problems**, which are more prevalent in high-performance engines. - **Overheating** and **Oil Leakage** are also faults observed in all types of engines, but less frequently, especially in **4-stroke Medium-Speed** and **2-stroke Low-Speed** engines. **1.4.4 Relatively Balanced Distribution** - The distribution of failures by engine type shows a certain **balance** between categories, but: - **2-stroke Low-Speed** shows fewer **Mechanical Wear** and **Fuel Issues** failures compared to other engines. - This may indicate that **low-speed engines** are **more robust** and have a **lower failure rate** compared to common problems in high-performance engines. - The **2-stroke Low-Speed** category has a **lower frequency of associated failures**, which can be advantageous, as the analysis model will have to deal with **a slight unbalance**, except for low-speed engines. The **4-stroke High-Speed** and **2-stroke Medium-Speed** engines are the most common in the dataset and, as expected, they present a **higher incidence of failures** associated with **mechanical wear** and **fuel problems**, reflecting the more intensive use of these engines. In contrast, **2-stroke Low-Speed** engines, although they represent a smaller part of the dataset, have a **lower failure rate**, suggesting that they are **more robust** and less prone to **typical failures**, such as **mechanical wear** or **fuel problems**. The distribution of failures by engine type is, in general, **balanced**, except for **2-stroke Low-Speed** engines, which have **fewer failures**. This distribution facilitates the analysis of engine performance.--DIVIDER--### **1.5 Analysis of Manufacturer Quantity Distributions** ```python plot_bar_with_hue(data=data, column="manufacturer", hue="failure_mode") ``` ![grafico_7.png](grafico_7.png) ```python plot_bar_with_counts(data, "manufacturer",color='purple') ``` ![grafico_20.png](grafico_20.png)--DIVIDER--We can conclude that MAN B&W largely dominates the dataset, representing the majority of records with 1562 occurrences, which may indicate greater adoption of its engines or a greater variety of models from this brand. Next, we have Yanmar with 1062 registrations, but already with a significant drop compared to MAN B&W. Manufacturers such as Rolls-Royce , Wärtsilä, Caterpillar and Mitsubishi appear in smaller quantities, with 527 to 793 occurrences, which suggests a more specific presence in the dataset or less diversity of models. When we look at failures, we can see that certain manufacturers are more associated with specific types of problems. MAN B&W, for example, presents faults such as Mechanical Wear (318 occurrences), Fuel Issues (284 occurrences) and Overheating (184 occurrences). Yanmar, the second most frequent, also records a considerable number of failures, mainly Mechanical Wear (195 occurrences), Fuel Issues (193 occurrences) and Overheating (130 occurrences). This suggests that certain manufacturers may be more prone to specific failures, which could be indicative of specific engine characteristics or different operating conditions.--DIVIDER--### **1.6 Analysis of Engine Quantity Distributions** ```python plot_bar_with_counts(data, "engine_id", color='pink') ``` ![grafico_8.png](grafico_8.png)--DIVIDER--**1.6.1 Uniform Distribution:** - The distribution of engine_ids is completely uniform, with each engine (from **ENG_001** to **ENG_050**) having exactly 104 records. This indicates that the dataset was structured in such a way that each engine has the same number of records, which can be useful to ensure an equitable analysis of the engines in the dataset. **1.6.2 Possible Indicator of Engine Diversity:** - The fact that there are 50 different engine IDs, with the same number of records, suggests a diversity of engines in the dataset. This can be interesting for modeling, as the model can learn to generalize the characteristics of different types of engines.--DIVIDER--### **1.7 Descriptive Statistical Analysis** Considering that there are differences in the engine profiles, to perform the descriptive statistical analysis, we will group the engines by engine type (engine_type). This way, we will be able to analyze the variables according to each type of engine, identifying specific patterns for each category. ```python # Grouping data by 'engine_type' grouped = data.groupby('engine_type') # Iterating through each 'engine_type' group and applying describe() for engine, group in grouped: print(f"Descriptive statistics for the engine {engine}:") display(group.describe()) print("\n" + "="*50 + "\n") ``` Descriptive statistics for the engine 2-stroke Low-Speed: | Statistic | Engine Temp | Oil Pressure | Fuel Consumption | Vibration Level | RPM | Engine Load | Coolant Temp | Exhaust Temp | Running Period | Fuel Consumption per Hour | |-----------|------------|--------------|------------------|----------------|-----|-------------|--------------|--------------|----------------|--------------------------| | **Count** | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | 796.000000 | | **Mean** | 85.162926 | 7.289004 | 3932.811955 | 3.737143 | 1500.423723 | 50.046510 | 84.956753 | 449.816354 | 84.110651 | 116.325764 | | **Std** | 7.104492 | 0.753409 | 2537.696784 | 0.356628 | 183.740580 | 17.523761 | 7.281260 | 1.350857 | 50.047411 | 81.261187 | | **Min** | 66.228654 | 5.104294 | 1000.000000 | 2.628596 | 779.782979 | 20.056605 | 70.000000 | 433.126998 | 0.026501 | 100.000000 | | **25%** | 79.945180 | 6.703343 | 1793.669448 | 3.495721 | 1382.257315 | 34.907655 | 79.511771 | 450.000000 | 39.518859 | 100.000000 | | **50%** | 85.095806 | 7.501076 | 3387.200941 | 3.739999 | 1502.306758 | 49.660860 | 84.847404 | 450.000000 | 84.045377 | 100.000000 | | **75%** | 90.007971 | 8.000000 | 5660.098061 | 3.982576 | 1629.363548 | 65.718762 | 90.212482 | 450.000000 | 128.318703 | 100.000000 | | **Max** | 104.755546 | 8.000000 | 12157.288286 | 4.766604 | 2055.992725 | 79.973462 | 100.000000 | 450.000000 | 167.909123 | 800.000000 | --- Descriptive statistics for the engine 2-stroke Medium-Speed: | Statistic | engine_temp | oil_pressure | fuel_consumption | vibration_level | rpm | engine_load | coolant_temp | exhaust_temp | running_period | fuel_consumption_per_hour | |------------|------------|--------------|------------------|----------------|------|-------------|-------------|--------------|---------------|--------------------------| | count | 1529.000 | 1529.000 | 1529.000 | 1529.000 | 1529 | 1529.000 | 1529.000 | 1529.000 | 1529.000 | 1529.000 | | mean | 85.015 | 7.281 | 3970.649 | 3.765 | 1499 | 49.835 | 85.004 | 449.851 | 85.409 | 115.867 | | std | 7.223 | 0.756 | 2462.943 | 0.363 | 209 | 17.133 | 7.504 | 1.224 | 48.743 | 80.044 | | min | 64.628 | 5.000 | 1000.000 | 2.500 | 762 | 20.067 | 70.000 | 426.124 | 0.042 | 100.000 | | 25% | 79.539 | 6.724 | 1877.587 | 3.524 | 1359 | 34.928 | 79.610 | 450.000 | 42.729 | 100.000 | | 50% | 85.190 | 7.483 | 3486.633 | 3.769 | 1494 | 50.315 | 84.883 | 450.000 | 87.375 | 100.000 | | 75% | 90.139 | 8.000 | 5704.353 | 4.018 | 1641 | 64.246 | 90.208 | 450.000 | 128.045 | 100.000 | | max | 106.217 | 8.000 | 11597.795 | 4.904 | 2206 | 79.983 | 100.000 | 450.000 | 167.857 | 800.000 | --- Descriptive statistics for the engine 4-stroke High-Speed: | Statistic | engine_temp | oil_pressure | fuel_consumption | vibration_level | rpm | engine_load | coolant_temp | exhaust_temp | running_period | fuel_consumption_per_hour | |------------|------------|--------------|------------------|----------------|------|-------------|-------------|--------------|---------------|--------------------------| | count | 1548.000 | 1548.000 | 1548.000 | 1548.000 | 1548 | 1548.000 | 1548.000 | 1548.000 | 1548.000 | 1548.000 | | mean | 84.960 | 7.279 | 3872.838 | 3.740 | 1492 | 49.473 | 84.836 | 449.790 | 82.932 | 117.346 | | std | 7.264 | 0.745 | 2385.983 | 0.370 | 208 | 17.451 | 7.472 | 1.451 | 48.368 | 88.570 | | min | 63.358 | 5.000 | 1000.000 | 2.500 | 733 | 20.036 | 70.000 | 429.901 | 0.052 | 100.000 | | 25% | 79.941 | 6.724 | 1888.066 | 3.496 | 1348 | 33.764 | 79.620 | 450.000 | 41.115 | 100.000 | | 50% | 84.806 | 7.499 | 3422.351 | 3.741 | 1490 | 49.556 | 84.695 | 450.000 | 82.379 | 100.000 | | 75% | 90.222 | 8.000 | 5407.048 | 3.998 | 1633 | 64.465 | 90.379 | 450.000 | 125.912 | 100.000 | | max | 104.856 | 8.000 | 11597.686 | 4.982 | 2175 | 79.901 | 100.000 | 450.000 | 167.773 | 800.000 | --- Descriptive statistics for the engine 4-stroke Medium-Speed: | Statistic | engine_temp | oil_pressure | fuel_consumption | vibration_level | rpm | engine_load | coolant_temp | exhaust_temp | running_period | fuel_consumption_per_hour | |------------|------------|--------------|------------------|----------------|------|-------------|-------------|--------------|---------------|--------------------------| | count | 1327.000 | 1327.000 | 1327.000 | 1327.000 | 1327 | 1327.000 | 1327.000 | 1327.000 | 1327.000 | 1327.000 | | mean | 85.198 | 7.280 | 3978.232 | 3.738 | 1497 | 49.974 | 85.214 | 449.858 | 83.734 | 115.446 | | std | 7.335 | 0.748 | 2414.796 | 0.372 | 201 | 17.453 | 7.563 | 1.173 | 49.146 | 77.517 | | min | 60.525 | 5.000 | 1000.000 | 2.500 | 865 | 20.001 | 70.000 | 426.829 | 0.238 | 100.000 | | 25% | 80.034 | 6.730 | 1963.813 | 3.490 | 1366 | 35.393 | 79.521 | 450.000 | 41.118 | 100.000 | | 50% | 85.069 | 7.467 | 3502.965 | 3.731 | 1502 | 50.142 | 85.449 | 450.000 | 82.340 | 100.000 | | 75% | 90.582 | 8.000 | 5656.639 | 3.984 | 1638 | 65.530 | 90.714 | 450.000 | 126.574 | 100.000 | | max | 107.358 | 8.000 | 12672.543 | 5.000 | 2132 | 79.965 | 100.000 | 450.000 | 167.987 | 800.000 |

**1.7.1 Comparison of Means** - The average of the main variables presents small variations between engines: - **Engine temperature**: Similar average values (~85°C), with small variations. - **Oil pressure**: Average consistent around 7.28-7.29, indicating stability in this parameter. - **Fuel consumption (fuel_consumption)**: Average values slightly higher in medium-sized engines (~3970) and lower in high-speed engines (~3872). - **Vibration level**: Average values between 3.73 and 3.76, with small differences. - **RPM**: Average close to 1500 RPM for all engines. - **Engine load (engine_load)**: Similarity in average values (~50), indicating that the engines operate at similar loads. **1.7.2. Comparison of Dispersions (Standard Deviation - std)** - **Engine temperature**: Close standard deviations (~7°C), indicating a moderate variation in the data. - **Oil pressure**: Low variation (~0.75), showing stable behavior between engines. - **Fuel consumption**: High variation, especially in Low-Speed engines, indicating a greater range between minimum and maximum values. - **Vibration level**: Small variation (~0.36), showing consistent behavior.

- **RPM**: Greater variation in Medium-Speed and High-Speed engines, possibly due to operational adjustments. **1.7.3. Comparison of Minimum and Maximum** - The 4-stroke Medium-Speed engine has the lowest minimum temperature (60.52°C) and the highest maximum temperature (107.35°C), indicating a greater operating range. - The 2-stroke Low-Speed engine has the lowest minimum (779) and maximum (2055) RPM, suggesting a smaller range of variation compared to the High-Speed engines (732–2175). - Oil pressure has an upper limit of 8.0 for all engines. - Fuel consumption per hour varies between 100 and 800 for all engines, but the dispersion is greater in the 4-stroke Medium-Speed engine, which reaches 12672.54 in total consumption. **1.7.4. Key Insights** - Low-Speed and Medium-Speed 2-stroke engines have higher fuel consumption, possibly due to their efficiency under heavier loads. - 4-stroke High-Speed engines have a slightly lower consumption, suggesting greater energy efficiency for high-speed operations. - Medium-Speed engines show greater temperature variation, which may indicate a greater need for thermal monitoring. - The vibration level is stable across all engines, without major differences between categories. The four engine types, despite being from different manufacturers, exhibit similar behavior in terms of temperature, oil pressure, RPM, and vibration. However, Low-Speed and Medium-Speed engines consume more fuel, whereas High-Speed engines operate with lower relative fuel consumption but experience greater temperature variation. --DIVIDER--### **1.8 Analysis of Distribution and Density Curve by Motor**

To better understand the behavior of different engines, we analyzed the distribution of their main operational variables, such as temperature, oil pressure, RPM and vibration. Additionally, we use density curves to visualize patterns and identify possible differences between engine types. This analysis can provide insights into fuel consumption, energy efficiency and thermal variations, helping to optimize performance and predict failures. ```python def plot_histograms_with_kde(data, variables): """ Plots histograms with the density curve (KDE) for several variables, grouped by engine type. Args: - data: DataFrame with the data. - variables: List of variables (columns) to be analyzed. """ #Setting the number of subgraphs n = len(variables) # Creating the figure with multiple subgraphs fig, axes = plt.subplots(nrows=(n // 2) + (n % 2), ncols=2, figsize=(14, 6 * ((n // 2) + (n % 2)))) axes = axes.flatten() # Flatten for easy indexing for i, variable in enumerate(variables): sns.histplot(data=data, x=variable, hue='engine_type', kde=True, multiple='stack', bins=20, palette='Set2', ax=axes[i]) # Adding title and labels axes[i].set_title(f'Distribution of {variable} by Engine Type') axes[i].set_xlabel(variable) axes[i].set_ylabel('Density/Frequency') # Adjusting the layout so as not to overlap the subgraphs plt.tight_layout() plt.show() ``` ```python # Instantiating the function plot_histograms_with_kde(data, ['engine_temp', 'oil_pressure', 'fuel_consumption', 'vibration_level', 'rpm','engine_load', 'coolant_temp', 'exhaust_temp', 'running_period', 'fuel_consumption_per_hour']) ``` ![grafico_9.png](grafico_9.png) **1.8.1 Distribution of Variables**

Analysis of the distribution of variables can reveal important patterns in engine operation, allowing the identification of trends and distinct operational behaviors between different types of engines. **1.8.2 Variables with Normal Distribution (Bell Curve)**

- Some variables follow a normal distribution, characterized by the concentration of most values around the mean and a symmetrical dispersion, with few extreme values.

- `engine_temp (Engine temperature)` - `rpm (Revolutions Per Minute)` - `vibration_level (Vibration Level)` High-Speed (4-stroke) engines tend to have higher average rpm and engine_temp values, operating in higher ranges compared to Low-Speed (2-stroke) engines, which operate at lower speeds. The vibration level, on the other hand, presents a similar distribution between the engines, but with greater dispersion in the High-Speed engines, indicating greater variability in the oscillations.

**1.8.3 Variables with Asymmetric Distribution (Long Tail to the Right)**

Some variables present positive skewness, that is, they have a long tail to the right. This means that although most values are concentrated in a certain range, there are some exceptionally high values that increase the average and widen the dispersion of the data.

- `fuel_consumption (Total Fuel Consumption)` - `fuel_consumption_per_hour (Fuel Consumption Per Hour)` - `engine_load (Engine Load - %)` - `exhaust_temp (Exhaust Gas Temperature)` Low-Speed engines (2-stroke) have less variability in fuel consumption, with a more predictable and stable pattern. High-Speed engines (4-stroke) record extreme values of fuel consumption, reflecting intense operational peaks. This behavior may be associated with variations in engine load and power demands during operation. --DIVIDER--In the next stage, a detailed analysis will be carried out to check the presence of outliers in the variables in our dataset, which may indicate atypical behaviors or measurement errors that may affect the analysis models. ```python def plot_boxplots(df, columns, figsize=(15, 10)): """ Plots boxplots for each specified numeric variable. Parameters: df (pd.DataFrame): DataFrame containing the data. columns (list): List of numeric columns to be plotted. figsize (tuple): Size of the figure (width, height). """ num_vars = len(columns) fig, axes = plt.subplots(nrows=num_vars, ncols=1, figsize=figsize) for i, col in enumerate(columns): sns.boxplot(data=df, x=col, ax=axes[i], color='skyblue') axes[i].set_title(f'Boxplot de {col}') axes[i].set_xlabel('') plt.tight_layout() plt.show() ``` ```python # List of numeric columns num_cols = [ 'engine_temp', 'oil_pressure','fuel_consumption', 'vibration_level', 'rpm', 'engine_load','coolant_temp', 'exhaust_temp', 'running_period', 'fuel_consumption_per_hour'] # Call the function to plot the boxplots plot_boxplots(data, num_cols) ``` ![grafico_10.png](grafico_10.png) By analyzing the boxplots of the engines above, it is possible to identify the presence of outliers in the following variables:

- `engine_temp` (Engine temperature) - `fuel_consumption` (Fuel consumption) - `fuel_consumption_per_hour` (Fuel consumption per hour) - `exhaust_temp` (Exhaust temperature) - `vibration_level` (Vibration level) - `rpm` (Revolutions per minute) The presence of outliers may be associated with engine failure conditions, indicating atypical behaviors that may reflect operational problems. However, it is also possible that these outliers occur due to profile differences between engine types, such as high-speed and low-speed engines, which have different operating characteristics. Therefore, it is important to further investigate whether these extreme values are related to specific faults or whether they are simply a reflection of differences between engines.

To investigate this hypothesis, we created an auxiliary variable called `Failure_Binary`. This variable has the value 0 for cases in which there is no failure and the value 1 for cases with failure. With this, we can compare the behavior of variables with and without outliers in relation to engine failures.--DIVIDER--```python # Create the failure binary variable data['Failure_Binary'] = np.where(data['failure_mode'] == 'No Failure', 0, 1) ``` ```python def detect_outliers(df, col): Q1 = df[col].quantile(0.25) Q3 = df[col].quantile(0.75) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR return df[(df[col] < lower_bound) | (df[col] > upper_bound)] # Select relevant variables for outlier analysis variables = ['engine_temp', 'fuel_consumption', 'vibration_level', 'rpm', 'exhaust_temp','fuel_consumption_per_hour'] # Detect outliers for each variable outliers_dict = {var: detect_outliers(data, var) for var in variables} def plot_outliers_bar_chart(outliers_dict): # Initialize lists for variables and outlier quantities variables_list = [] outliers_count_list = [] # Fill the lists with data from the outlier dictionary for var, outliers_df in outliers_dict.items(): variables_list.append(var) outliers_count_list.append(len(outliers_df)) # Create the bar chart plt.figure(figsize=(10,6)) bars = plt.barh(variables_list, outliers_count_list, color='skyblue') # Add title and labels to axes plt.title('Number of Outliers per Variable') plt.xlabel('Number of Outliers') plt.ylabel('Variables') # Add the values to the top of each bar for bar in bars: plt.text(bar.get_width(), bar.get_y() + bar.get_height()/2, str(int(bar.get_width())), va='center', ha='left', color='black') # Display the graph plt.show() ``` ```python # Now, to plot the graph with the values at the top of the bars, just call the function plot_outliers_bar_chart(outliers_dict) ``` ![grafico_11.png](grafico_11.png)--DIVIDER--Among the variables analyzed, `fuel_consumption_per_hour` presented the largest number of outliers. Now, let's investigate the behavior of variables under failure and non-failure conditions by plotting boxplots for each variable against the `Failure_Binary` column:

```python def plot_boxplots_for_variables(variables, data, title_prefix='Boxplot de'): """ Plots the boxplots of the variables against the 'Failure_Binary' column in a single plot. Parameters: - variables: List with the names of the variables to be analyzed. - data: DataFrame containing the data. - title_prefix: Prefix for the chart title. """ # Graphics configuration plt.figure(figsize=(28, 8)) # Loop to create boxplots for each variable for i, var in enumerate(variables, 1): plt.subplot(1, len(variables), i) sns.boxplot(x='Failure_Binary', y=var, data=data, palette="coolwarm") plt.title(f'{title_prefix} {var} vs. Failure') plt.xlabel('Failed (0 = Normal, 1 = Failed)') plt.ylabel(var) # Adjusts the layout to ensure graphics do not overlap plt.tight_layout() plt.show() ``` ```python # List of variables for analysis variables = ['engine_temp', 'fuel_consumption', 'vibration_level', 'rpm', 'exhaust_temp','fuel_consumption_per_hour'] # Call the function to generate the boxplots plot_boxplots_for_variables(variables, data) ``` ![grafico_12.png](grafico_12.png) The boxplots reveal significant differences in the behavior of variables between fault conditions and normal operation. This suggests that the outliers may be directly associated with engine failures. To validate the hypothesis that outliers are not necessarily related to failures, we applied the **Mann-Whitney U statistical test**. This test allows comparing the distributions of variables between groups with and without failure, evaluating whether there are statistically significant differences. Below are the results: ```python def test_statistical_difference(data, variables): for var in variables: normal = data[data['Failure_Binary'] == 0][var] failure = data[data['Failure_Binary'] == 1][var] stat, p_value = mannwhitneyu(normal, failure, alternative='two-sided') print(f"{var}: U={stat:.2f}, p-valor={p_value:.5f}") if p_value < 0.05: print(f" -> Statistically significant difference between normal and failure cases!\n") else: print(f" -> There is not enough evidence to conclude that there is a difference.\n") ``` ```python variables = ['engine_temp', 'fuel_consumption', 'vibration_level', 'rpm', 'exhaust_temp','fuel_consumption_per_hour'] test_statistical_difference(data, variables) ``` engine_temp: U=2361907.00, p-valor=0.00000 -> Statistically significant difference between normal and failure cases! fuel_consumption: U=1853117.00, p-valor=0.00000 -> Statistically significant difference between normal and failure cases! vibration_level: U=2862284.50, p-valor=0.00000 -> Statistically significant difference between normal and failure cases! rpm: U=3272684.00, p-valor=0.81185 -> There is not enough evidence to conclude that there is a difference. exhaust_temp: U=3233628.00, p-valor=0.00123 -> Statistically significant difference between normal and failure cases! fuel_consumption_per_hour: U=2891093.50, p-valor=0.00000 -> Statistically significant difference between normal and failure cases! **1.8.4 Variables with Statistically Significant Difference:**

- Engine Temperature (`engine_temp`): The difference between failing engines and normal engines is statistically significant, suggesting that variations in temperature may be directly associated with failures.

- Total Fuel Consumption (`fuel_consumption`): There is a significant discrepancy between the two groups, indicating that failures can directly affect fuel consumption.

- Vibration Level (`vibration_level`): Engine vibration presents significant differences, which may indicate that failing engines exhibit anomalous vibration patterns.

- Exhaust Gas Temperature (`exhaust_temp`): The statistical difference suggests that the thermal exhaust of engines can be an important indicator for monitoring failures.

- Fuel Consumption per Hour (`fuel_consumption_per_hour`): Just like total consumption, this variable also presents differences between normal and failing engines, reinforcing the relationship between failure and variation in energy consumption.

**1.8.5 Variable Without Statistically Significant Difference:**

- Revolutions per Minute (`rpm`): The high p-value indicates that there is not enough evidence to state that there is a significant difference in the behavior of this variable between normal and failing engines. This suggests that engine speed may not be a reliable indicator of failure, or that failures may occur in different speed ranges without a clear pattern. ```python # Removing the Failure_Binary column created to help solve the problem: data = data.drop(columns='Failure_Binary') ```--DIVIDER--### **1.9 Correlation Analysis** Correlation analysis is a fundamental step to identify relationships between variables in the data set. The objective is to verify which variables present association patterns, which can be useful to detect possible indicators of engine failure. ```python # Step 1: Remove unneeded columns: data_one_hot_encode = data.drop(columns=['engine_id', 'timestamp']) # Step 2: Apply One-Hot Encoding to categorical variables (strings) categorical_columns = data_one_hot_encode.select_dtypes(include=['object']).columns # Step 3: Applying One-Hot Encoding without removing the first category data_one_hot_encode = pd.get_dummies(data_one_hot_encode, columns=categorical_columns, drop_first=False) # Step 4: Convert True/False to 1/0 data_one_hot_encode = data_one_hot_encode.astype(int) ``` ```python # Step 5: Calculate the correlation matrix correlation_matrix = data_one_hot_encode.corr(method='spearman') # Step 6: Plot the correlation matrix plt.figure(figsize=(18, 8)) sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5) plt.title('Correlation Matrix') plt.show() ``` ![grafico_13.png](grafico_13.png) From the correlation matrix presented in this way, it is possible to check which variables have the greatest correlation with each other, and which would be the potential features that have a correlation with engine failures. In order to explore these correlations, a function will be created so that we can plot and not only analyze the strength of correlation but also try to evaluate the type of correlation. ```python def plot_scatter(ax, x, y, xlabel, ylabel, title,): """Helper function to create scatter plots""" ax.scatter(x, y, color='blue', alpha=0.5, label='Data') ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) ax.set_title(title) ax.grid(True) def plot_correlations(data, column_pairs, ncols=4, figsize=(16, 8)): """ Generates scatter plots for specified column pairs Parameters: data (DataFrame): DataFrame with the data column_pairs (list): List of tuples with column pairs to plot [(x1, y1), (x2, y2), ...] ncols (int): Number of chart columns per row figsize (tuple): Size of the figure (width, height) """ n_pairs = len(column_pairs) nrows = (n_pairs + ncols - 1) // ncols # Calculate the required number of lines fig, axes = plt.subplots(nrows, ncols, figsize=figsize) axes = axes.flatten() # Transform the axis matrix into a flat list # Plot each pair of columns for i, (x_col, y_col) in enumerate(column_pairs): plot_scatter(axes[i], data[x_col], data[y_col], x_col, y_col, f'{x_col} vs {y_col}') # Disable unused axes for j in range(i+1, len(axes)): axes[j].axis('off') plt.tight_layout() plt.show() ``` **Strong Correlations**

- **engine_temp** and **coolant_temp** → 0.92 - **engine_temp** and **engine_load** → 0.73 - **oil_preasure** and **engine_load** → 0.84 - **fuel_consumption** and **running_period** → 0.77 - **severity_requires maintenance** and **failure_mode_Fuel Issues** → 0.81 - **failure_mode_Mechanical Wear** and **severity_normal** → 0.72 ```python # List of column pairs to correlate strong_correlation_pairs = [('engine_temp','coolant_temp'), ('engine_temp', 'engine_load'), ('oil_pressure', 'engine_load'), ('fuel_consumption', 'running_period'),\ ('failure_mode_Overheating', 'engine_temp')] # Chamada da função plot_correlations(data_one_hot_encode, column_pairs=strong_correlation_pairs, ncols=3, figsize=(20, 8)) ``` ![grafico_14.png](grafico_14.png) **1.9.1 Relationship Between Engine Temperature and Coolant Temperature**

- There is an almost linear positive relationship between engine temperature (`engine_temp`) and coolant temperature (`coolant_temp`). - This suggests that the cooling system is responding proportionally to changes in engine temperature, ensuring optimal thermal management. - Significant deviations from this trend (e.g., `engine_temp` rising sharply while `coolant_temp` remains constant) could indicate cooling system issues, such as a faulty radiator, an inefficient water pump, or sensor inaccuracies. **1.9.2 Engine Temperature vs. Engine Load**

- There is a clear positive correlation: as engine load (`engine_load`) increases, engine temperature (`engine_temp`) also tends to rise. - This is expected, as higher loads require more power output, generating additional heat. - However, if an engine operates at low loads but exhibits high temperatures, this may indicate thermal inefficiency, poor lubrication, or cooling system malfunctions. **1.9.3 Oil Pressure vs. Engine Load**

- Oil pressure (`oil_pressure`) values appear to be concentrated at a few discrete levels, suggesting that the oil pressure regulation system maintains it within specific ranges. - The system may be designed to stabilize pressure regardless of variations in engine load (`engine_load`). - If oil pressure does not vary as expected (e.g., remains too low under high loads), this could indicate issues such as an oil pump failure, improper oil viscosity, or potential leaks. **1.9.4 Fuel Consumption vs. Running Period**

- There is a positive yet nonlinear relationship between fuel consumption (`fuel_consumption`) and running time (`running_period`). Initially, fuel consumption increases rapidly but then stabilizes. - New or recently started engines may consume more fuel at first to reach optimal operating temperature. Over time, fuel consumption tends to level off as thermal efficiency improves. - A deeper analysis could help identify engines consuming more fuel than expected for a given operating period, potentially indicating inefficiencies, leaks, or suboptimal combustion. **1.9.5 Failure Mode (Overheating) vs. Engine Temperature**

- Engines that experienced overheating failures (`failure_mode_Overheating = 1`) show significantly higher temperatures than normal engines (`failure_mode_Overheating = 0`). - This confirms that excessive temperatures are a critical factor in engine failures. - A critical threshold for `engine_temp` can be defined, above which the risk of overheating-related failures increases sharply. This threshold could be used for real-time alerts and predictive maintenance to prevent breakdowns. **Moderate Correlations:**

| **Moderate Positive Correlations (0.4 to 0.7)** | **Correlações Negativas Moderadas (-0.4 a -0.7)** | |----------------------------------------------------|---------------------------------------------------| | **engine_temp** and **oil_pressure** → 0.62 | **failure_mode_Oil Leakage** and **failure_mode_Overheating** → -0.47 | | **oil_preasure** and **coolant_temp** → 0.57 | **failure_mode_Oil Leakage** and **severity_Critical** → -0.50 | | **engine_load** and **coolant_temp** → 0.67 | **failure_mode_Oil Leakage** and **severity_Requires Maintenance** → -0.50 | | **engine_load** and **exhaust_temp** → 0.67 | **failure_mode_Mechanical Wear** and **severity_Critical** → -0.42 | | **fuel_consumption** and **engine_load** → 0.44 | **failure_mode_Mechanical Wear** and **severity_Normal** → -0.40 | | **failure_mode_Overheating** and **severity_Critical** → 0.54 | **failure_mode_No Failure** and **severity_Critical** → -0.59 | | **failure_mode_Overheating** and **severity_Requires Maintenance** → 0.47 | **failure_mode_No Failure** and **severity_Requires Maintenance** → -0.59 | | **failure_mode_Overheating** and **severity_Normal** → 0.56 | |

```python # List of column pairs to correlate moderate_correlation_pairs = [('engine_temp','oil_pressure'), ('oil_pressure', 'coolant_temp'), ('engine_load', 'coolant_temp'), ('engine_load', 'exhaust_temp'), ('fuel_consumption', 'engine_load') ] # Function call plot_correlations(data_one_hot_encode, column_pairs=moderate_correlation_pairs , ncols=3, figsize=(18, 9)) ``` ![grafico_15.png](grafico_15.png) **1.9.6 Engine Temperature (engine_temp) and Oil Pressure (oil_pressure)**

- This confirms the trend observed in the graph: as engine temperature increases, oil pressure also tends to rise. This may indicate a thermal effect on oil viscosity or a response from the engine lubrication system. **1.9.7 Oil Pressure (oil_pressure) and Coolant Temperature (coolant_temp)**

- This result indicates that engines operating with higher oil pressure tend to have higher coolant temperatures. This may be related to the efficiency of the lubrication and cooling system. **1.9.8 Engine Load (engine_load) and Coolant Temperature (coolant_temp)**

- The graph already indicated this positive relationship. Engines under higher load generate more heat, demanding more from the cooling system. **1.9.9 Engine Load (engine_load) and Exhaust Temperature (exhaust_temp)**

- This correlation reinforces that as engine load increases, exhaust gas temperature also rises. This behavior is expected, as more intense combustion generates greater thermal dissipation. **1.9.10 Fuel Consumption (fuel_consumption) and Engine Load (engine_load)**

- This relationship was evident in the graph. Fuel consumption increases as engine load rises, which is expected in internal combustion engines. **1.9.11 Failure Mode Overheating (failure_mode_Overheating) and Critical Severity (severity_Critical)**

- This suggests that when overheating occurs, there is a moderate tendency for it to result in critical failures. This reinforces the need to monitor engine temperature. **1.9.12 Failure Mode Overheating (failure_mode_Overheating) and Requires Maintenance Severity (severity_Requires Maintenance)**

- In some cases, overheating may not be critical but may indicate the need for maintenance. **1.9.13 Failure Mode Overheating (failure_mode_Overheating) and Normal Severity (severity_Normal)**

- Interestingly, there is a moderate positive correlation between overheating and failures classified as normal. This may indicate that overheating does not always lead to critical problems, depending on its intensity and duration. **1.9.14 Oil Leakage Failure (failure_mode_Oil Leakage) and Overheating Failure (failure_mode_Overheating)**

- This suggests that oil leaks rarely occur simultaneously with overheating, indicating that these failure modes do not have a direct relationship. **1.9.15 Oil Leakage Failure (failure_mode_Oil Leakage) and Critical Severity (severity_Critical)**

- Oil leaks may be less likely to cause critical failures, possibly because they are detected before becoming severe. **1.9.16 Oil Leakage Failure (failure_mode_Oil Leakage) and Requires Maintenance Severity (severity_Requires Maintenance)**

- This reinforces the idea that oil leaks are not among the main causes requiring immediate corrective maintenance. **1.9.17 Mechanical Wear Failure (failure_mode_Mechanical Wear) and Critical Severity (severity_Critical)**

- This indicates that mechanical wear failures are not the main contributors to critical failures, possibly occurring gradually and predictably. **1.9.18 Mechanical Wear Failure (failure_mode_Mechanical Wear) and Normal Severity (severity_Normal)**

- Mechanical wear seems to occur less frequently in scenarios considered normal. **1.9.19 No Failure Mode (failure_mode_No Failure) and Critical Severity (severity_Critical)**

- The fewer failures occur, the lower the probability of a critical situation, which is intuitive. **1.9.20 No Failure Mode (failure_mode_No Failure) and Requires Maintenance Severity (severity_Requires Maintenance)** - This reinforces that when the system operates without failures, the need for corrective maintenance decreases.

**Weak Correlations**

| **Weak Positive Correlations (0.2 to 0.4)** | **Weak Negative Correlations (-0.2 to -0.4)** | |-------------------------------------------------------------|-----------------------------------------------------------| | **engine_temp** and **fuel_consumption** → 0.33 | **fuel_consumption** and **failure_mode_No Failure** → -0.24 | | **engine_temp** and **vibration_level** → 0.31 | **fuel_consumption** and **failure_mode_Oil Leakage** → -0.28 | | **engine_temp** and **exhaust_temp** → 0.24 | **failure_mode_No Failure** and **severity_Critical** → -0.24 | | **fuel_consumption** and **engine_load** → 0.30 | **failure_mode_No Failure** and **severity_Normal** → -0.37 | | **coolant_temp** and **engine_load** → 0.36 | **failure_mode_No Failure** and **severity_Requires Maintenance** → -0.26 | | **coolant_temp** and **exhaust_temp** → 0.23 | **failure_mode_Oil Leakage** and **severity_Critical** → -0.24 | | **failure_mode_Fuel Issues** and **failure_mode_Mechanical Wear** → 0.25 | **failure_mode_Oil Leakage** and **severity_Normal** → -0.32 | | **failure_mode_Fuel Issues** and **severity_Critical** → 0.31 | **failure_mode_Oil Leakage** and **severity_Requires Maintenance** → -0.22 | | **failure_mode_Overheating** and **severity_Critical** → 0.34 | **failure_mode_Overheating** and **failure_mode_Mechanical Wear** → -0.30 | | **failure_mode_Overheating** and **severity_Normal** → 0.29 | **failure_mode_Overheating** and **failure_mode_Oil Leakage** → -0.24 | | **failure_mode_Overheating** and **severity_Requires Maintenance** → 0.26 |

```python # List of column pairs to correlate weak_correlation_pairs = [('engine_temp','fuel_consumption'), ('engine_temp', 'vibration_level'), ('engine_load', 'coolant_temp'), ('engine_load', 'exhaust_temp'), ('fuel_consumption', 'engine_load'),\ ('coolant_temp', 'engine_load'), ('coolant_temp','exhaust_temp')] # Function call plot_correlations(data_one_hot_encode, column_pairs=weak_correlation_pairs, ncols=3, figsize=(18, 9)) ``` ![grafico_16.png](grafico_16.png) **1.9.11 Engine Temperature vs. Fuel Consumption**

- There is a noticeable upward trend between engine temperature and fuel consumption. This may indicate that hotter engines consume more fuel, possibly due to increased power demand or reduced thermal efficiency at higher temperatures. - However, significant data dispersion suggests that other factors also influence fuel consumption. **1.9.12 Engine Temperature vs. Vibration Level**

- Engine vibration appears to be segmented into discrete values, which may indicate that the vibration sensor measures fixed levels rather than a continuous spectrum. - There is no clear correlation between engine temperature and vibration level, suggesting that other factors may have a stronger influence on vibration. **1.9.13 Engine Load vs. Coolant Temperature**

- There is a clear positive correlation where engines under higher loads tend to have higher coolant temperatures. This makes sense, as more demanding engine conditions generate more heat, requiring greater cooling capacity. - Data dispersion increases as load rises, suggesting that at high loads, cooling efficiency may vary depending on engine conditions. **1.9.14 Engine Load vs. Exhaust Temperature**

- Exhaust gas temperature appears to reach a maximum threshold of around 450°C. This may indicate an operational limit of the exhaust system, where temperature is controlled to prevent damage. - At lower loads, there is greater variation in exhaust gas temperature, suggesting that the engine may operate in different thermal modes depending on demand. **1.9.15 Fuel Consumption vs. Engine Load**

- The relationship is strongly positive, meaning that engines under higher loads require more fuel. The graph suggests a nonlinear pattern, where fuel consumption increases more rapidly at high loads. - This can be explained by efficiency losses and increased mechanical resistance under heavy loads. **1.9.16 Coolant Temperature vs. Exhaust Temperature**

- Exhaust gas temperature seems to reach a maximum limit regardless of increases in coolant temperature. This may indicate that the exhaust system has a thermal limit imposed by design, possibly to prevent overheating of the catalytic converter or other components. By analyzing the results, we can observe that some variables exhibit a linear relationship with each other. However, no metric showed a strong correlation with any specific failure, except for `engine_temp`, which demonstrated a linear relationship with `failure_mode_Overheating`. This suggests that engine_temp could be a key variable in predicting this type of failure. It is important to note that the correlation analysis used was linear. This means that while no strong correlations were identified between the failures and other variables, it does not necessarily imply that such relationships do not exist. Instead, the correlations may be more complex and non-linear in nature. To further investigate this possibility, an additional correlation analysis will be conducted using the **chi-square** method.--DIVIDER--```python def analyze_categorical_numerical_relationships(data, target="failure_mode"): """ Analyzes the relationship between numeric variables and a categorical variable using: - Chi-Square (Chi²) for discretized variables - ANOVA (F-stat) for differences in means - Eta² to measure the strength of the association - Identification of the most influential failure mode in each variable Returns a DataFrame with the results ordered by the strength of the relationship. """ numeric_vars = data.select_dtypes(include=["number"]).columns.tolist() if target in numeric_vars: numeric_vars.remove(target) results = [] # Get all unique failure modes failure_modes = data[target].unique() for var in numeric_vars: # Create bins (quartiles) for Chi-Square data[f"{var}_binned"] = pd.qcut(data[var], q=4, duplicates="drop") # Chi-Square contingency_table = pd.crosstab(data[f"{var}_binned"], data[target]) chi2, p_chi2, _, _ = chi2_contingency(contingency_table) # ANOVA groups = [data[var][data[target] == cat] for cat in data[target].unique()] f_stat, p_anova = stats.f_oneway(*groups) if len(groups) > 1 else (np.nan, np.nan) # Eta² grand_mean = np.mean(data[var]) ss_between = sum(len(group) * (np.mean(group) - grand_mean) ** 2 for group in groups) ss_total = sum((data[var] - grand_mean) ** 2) eta_squared = ss_between / ss_total if ss_total > 0 else 0 # Calculate the average of each failure mode for the variable failure_impacts = data.groupby(target)[var].mean() # Create a line with the average for each failure mode mode_columns = {f"{mode}": failure_impacts.get(mode, np.nan) for mode in failure_modes} results.append({ "Variable": var, "Chi2": chi2, "p-value Chi2": p_chi2, "F-stat": f_stat, "p-value ANOVA": p_anova, "Eta²": eta_squared, **mode_columns # Add failure mode columns }) # Create Results DataFrame results_df = pd.DataFrame(results) # Sort by Eta² (relationship strength) results_df = results_df.sort_values(by="Eta²", ascending=False) return results_df ``` ```python # Instantiating the function: df_results = analyze_categorical_numerical_relationships(data, target="failure_mode") display(df_results) ``` | Variable | Chi2 | p-value Chi2 | F-stat | p-value ANOVA | Eta² | No Failure | Overheating | Fuel Issues | Mechanical Wear | Oil Leakage | |------------------------------|------------|-------------------|-------------|-----------------|---------|------------|-------------|-------------|----------------|-------------| | oil_pressure | 1878.3569 | 0.000000e+00 | 1054.0622 | 0.000000e+00 | 0.4480 | 7.2022 | 7.8772 | 7.6105 | 7.3491 | 5.6889 | | coolant_temp | 2569.7718 | 0.000000e+00 | 966.5436 | 0.000000e+00 | 0.4267 | 82.8286 | 97.0233 | 85.8531 | 83.8252 | 78.0033 | | engine_temp | 2511.2935 | 0.000000e+00 | 930.3677 | 0.000000e+00 | 0.4174 | 82.9502 | 96.3285 | 86.2125 | 84.0975 | 77.7481 | | engine_load | 2221.5095 | 0.000000e+00 | 690.0428 | 0.000000e+00 | 0.3470 | 45.0061 | 68.2987 | 57.6422 | 49.6289 | 25.9987 | | fuel_consumption | 1881.6409 | 0.000000e+00 | 454.5458 | 0.000000e+00 | 0.2593 | 2742.2302 | 5195.7473 | 5310.7801 | 4954.3623 | 2311.3887 | | running_period | 1024.8120 | 8.670252e-212 | 154.2039 | 7.507213e-125 | 0.1061 | 68.7582 | 85.3004 | 88.1446 | 111.7049 | 82.6876 | | vibration_level | 352.9032 | 3.422920e-68 | 101.5193 | 2.372852e-83 | 0.0725 | 3.6957 | 3.9084 | 3.8088 | 3.7838 | 3.5002 | | fuel_consumption_per_hour | 0.0000 | 1.000000e+00 | 96.2357 | 4.153684e-79 | 0.0690 | 100.3316 | 116.2123 | 159.6435 | 112.3262 | 107.1112 | | exhaust_temp | 0.0000 | 1.000000e+00 | 10.3783 | 2.267933e-08 | 0.0079 | 449.7849 | 450.0000 | 449.9333 | 449.8326 | 449.5189 | | rpm | 296.6099 | 2.417450e-56 | 1.1919 | 3.121149e-01 | 0.0009 | 1496.1979 | 1503.1674 | 1502.0570 | 1496.8134 | 1477.3714 | --DIVIDER--Based on the results presented by the correlation analysis between the numerical variables and the categorical variable `failure_mode`, we can draw some conclusions about the variables' ability to be used to make failure predictions reliably. Let's analyze the main points:

1.8.1 **Analysis of Eta² Values (Association Strength)**

- Eta² is a statistical measure that evaluates the strength of the association between numerical variables and a categorical variable (in this case, the failure modes). The higher the Eta² value, the stronger the relationship between the numerical variable and the failure mode variable. This measure helps us identify which variables may be most predictive of failure, and thus guide the development of more effective predictive models.

**Variables with higher Eta² (high association strength):**

- **oil_pressure (0.448):**

Oil pressure presents the greatest strength of association with failure modes, indicating that it is a very informative variable for predicting failures. A significant variation in oil pressure can be an indication of mechanical failures or failures in the lubrication system, making this variable essential for predictive maintenance. - **coolant_temp (0.426) and engine_temp (0.417):**

Coolant and engine temperatures have a strong association with failure modes. This suggests that monitoring these variables can be essential to predict failures related to overheating, cooling system failures or problems related to engine performance. Both are crucial for identifying malfunctions due to temperature rise and can be monitored to prevent severe damage. 1.8.1.1 **Variables with intermediate Eta² (moderate association):**

- **vibration_level (0.0725):**

With an intermediate Eta², the vibration level shows a reasonable association with failure modes, especially for mechanical failures and wear. Although not as strong as oil_pressure or coolant_temp, vibration is an excellent indicator of problems in mechanical components, such as bearing wear or failures in moving parts. - **running_period (0.1061):**

The operating period, or equipment operating time, shows a moderate relationship with the failure variable. Failures related to time of use may indicate accumulated wear or failures due to continuous use, which is particularly relevant for failures caused by long periods of operation. 1.8.1.2 **Variables with lower Eta² (weak association strength):**

- **fuel_consumption_per_hour (0.068):**

The variable fuel_consumption_per_hour has a very weak association with failure modes, suggesting that, by itself, it is not a good predictor for failures. However, in more complex models, it can provide additional information about overall system performance, especially when combined with other more predictive variables. - **exhaust_temp (0.007):**

Exhaust gas temperature presents the lowest strength of association among all the variables analyzed, suggesting that, in isolation, it is not a good variable predicting failures. This may indicate that exhaust temperature is not a sensitive enough measurement to detect early failure unless it is part of a more complex monitoring system.

1.8.2 **Analysis of Significance Tests**

- **Chi-Square**: The p-value for the most relevant variables such as `oil_pressure`, coolant_temp, engine_temp, etc., is very low (close to zero), indicating that there is a statistically significant difference between the failure modes in relation to these variables. This means that these variables can actually differentiate failure modes and are therefore useful for prediction.

- **ANOVA (F-stat)**: The low p-value also suggests that the differences in means between the different failure modes for variables such as `oil_pressure` and `engine_temp` are significant.

1.8.3 **Impact of Failure Modes on Variables**

- Calculating averages per failure mode for each numerical variable shows how different failure modes influence the variables. For example: `oil_pressure` has a significantly higher average for the `Overheating` failure mode compared to other failure modes, indicating that variations in this variable can be an important indicator of overheating-related failures. `fuel_consumption` and `engine_temp` present clear variations in the averages for the different failure modes, which reinforces their relevance for predicting failures.

Variables with a strong association with failure modes, such as oil_pressure, coolant_temp, engine_temp and engine_load, can be used reliably to make failure predictions. These variables showed good ability to differentiate failure modes and have a strong association with the categorical variable, making them suitable for a prediction model. Variables with weak association, on the other hand, should be discarded or treated with caution when building the model. In summary, the failure prediction model can be robust if it is built with variables that show a strong relationship with failure modes, such as temperatures and oil pressure.--DIVIDER--## **2. Machine Learning Model Training and Evaluation**

In this section, we focus on developing and evaluating a machine learning model to predict potential failures based on the collected data. The goal is to leverage the identified correlations and patterns to build a reliable predictive model that can assist in early fault detection and maintenance planning.

We will begin by preparing the dataset, including feature selection and preprocessing steps. Next, we will train different machine learning models and assess their performance using appropriate evaluation metrics. Finally, we will analyze the results to determine the most effective model for failure prediction and discuss potential improvements. ### **2.1 Dataset Preprocessing**

Before training the machine learning model, it is essential to prepare the dataset to ensure that it is clean, structured, and suitable for analysis. This preprocessing step involves selecting relevant features, handling missing values, and splitting the dataset into training and testing sets.

```python # Adjusting the dataset for model training: data_1 = data[['timestamp', 'engine_id', 'engine_temp', 'oil_pressure','fuel_consumption', 'vibration_level', 'rpm', 'engine_load','coolant_temp', 'exhaust_temp', 'running_period','fuel_consumption_per_hour', 'engine_type', 'fuel_type', 'manufacturer','failure_mode', 'severity']] ``` First, a subset of the original dataset will be created, retaining only the most relevant variables for model training. The target variable was set to `failed_mode`, while certain features such as `timestamp`, `manufacturer`, `engine_type`, and fuel type were removed to prevent data leakage and reduce redundancy.

```python # Assuming 'failure_mode' is the target variable X = data_1.drop(['failure_mode','severity', 'timestamp', 'manufacturer', 'engine_type', 'fuel_type','engine_id', 'coolant_temp', 'fuel_consumption_per_hour', 'exhaust_temp' ] , axis=1) # Input variables # Separating the targets y= data_1['failure_mode'] ``` Next, the dataset was divided into training and testing sets using a 70/30 split, ensuring that the class distribution was maintained through stratified sampling. To better understand the class balance, pie charts were generated to visualize the distribution of failure modes in the original dataset and the training set. ```python # Splitting the data for the failure_mode model X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y) ``` ```python # Function to plot pie charts def plot_pie_chart(data, title, ax): data.value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90, ax=ax) ax.set_ylabel('') ax.set_title(title) # Creating side-by-side subgraphs fig, axs = plt.subplots(1, 2, figsize=(12, 6)) # Plotting the distribution of 'failure_mode' classes plot_pie_chart(y, 'Original Distribution - Failure Mode', axs[0]) plot_pie_chart(y_train, 'Training Distribution - Failure Mode', axs[1]) # Adjusting the layout plt.tight_layout() plt.show() ``` ![grafico_17.png](grafico_17.png)--DIVIDER--### **2.2 Model Training**

With the dataset preprocessed, the next step is to train a machine learning model to predict engine failure modes. The training process involves selecting an appropriate model, defining hyperparameters, and fitting the model to the training data.

The goal is to develop a predictive model that can accurately classify different failure modes based on key engine parameters such as engine temperature, oil pressure, fuel consumption, vibration levels, and RPM. Various machine learning algorithms may be tested to determine the most effective approach, considering metrics such as accuracy, precision, recall, and F1-score.

To evaluate the performance of different machine learning models in predicting engine failure modes, confusion matrices will be used. A confusion matrix provides a detailed breakdown of a model's predictions, showing the number of correctly and incorrectly classified instances for each failure mode. The function below automates the process of training multiple models, generating predictions, and visualizing their confusion matrices. This allows for a quick comparison of model performance across different failure categories. By analyzing the confusion matrices, we can identify patterns in misclassifications and assess which models best distinguish between failure modes. This analysis will help refine the model selection process and guide improvements in prediction accuracy. ```python # Function to plot confusion matrices for analyzing results def plot_confusion_matrices(models, X_train, y_train, X_test, y_test, class_names): fig, axes = plt.subplots(1, len(models), figsize=(18, 6)) colors = ['Blues', 'Greens', 'Reds'] for i, (name, model) in enumerate(models.items()): # Creating the pipeline for the current model pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', model)]) # Training the model pipeline.fit(X_train, y_train) # Making predictions y_pred = pipeline.predict(X_test) # Calculating the confusion matrix cm = confusion_matrix(y_test, y_pred) # Plotting the confusion matrix sns.heatmap(cm, annot=True, fmt='d', cmap=colors[i], xticklabels=class_names, yticklabels=class_names, ax=axes[i], cbar=False) axes[i].set_title(f'Prediction Matrix- {name}') axes[i].set_xlabel('Predição') axes[i].set_ylabel('Real') plt.tight_layout() plt.show() ``` After selecting the machine learning models, the next step is to train, evaluate, and compare their performance in predicting engine failure modes. The process begins by encoding the target variable (`failure_mode`) using **Label Encoding** to convert categorical failure types into numerical values. Then, a preprocessing pipeline is defined to handle missing values and normalize numerical features, ensuring that models receive properly scaled data.

Three different models are trained and evaluated: - `Random Forest` - `Logistic Regression` - `XGBoost`

Each model is trained using the preprocessed dataset, and its performance is assessed using classification reports, cross-validation scores, and confusion matrices. Additionally, a bar chart visualization is generated to compare key metrics (Precision, Recall, and F1-Score) across models. This comprehensive evaluation will help determine the most effective model for predicting engine failures based on sensor data. ```python # Initialize LabelEncoder label_encoder = LabelEncoder() # Coding the classes y_train_encoded = label_encoder.fit_transform(y_train) y_test_encoded = label_encoder.transform(y_test) # Defining a basic preprocessing (imputation and normalization) preprocessor = ColumnTransformer(transformers=[('num', Pipeline([('imputer', SimpleImputer(strategy='mean')),('scaler', StandardScaler())]), X_train.columns)]) # Creating pipelines for each model models = { 'Random Forest': RandomForestClassifier(n_estimators=150, random_state=42), 'Logistic Regression': LogisticRegression(random_state=42), 'XGBoost': XGBClassifier(random_state=42) } # Dictionary to store metrics metrics_data = [] # Class name: class_names = ['Fuel Issues', 'Mechanical Wear', 'No Failure', 'Oil Leakage', 'Overheating'] # Training and evaluating each model for name, model in models.items(): pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', model)]) # Training the model with the coded classes pipeline.fit(X_train, y_train_encoded) # Making predictions with the coded classes y_pred_encoded = pipeline.predict(X_test) # Get the rating report report = classification_report(y_test_encoded, y_pred_encoded, output_dict=True) # Evaluating the model print(f"\n{name} - Classification Report:") print(classification_report(y_test_encoded, y_pred_encoded)) # Calculating cross-validation score cv_scores = cross_val_score(pipeline, X_train, y_train_encoded, cv=5) print(f"{name} - Cross Validation Score: {cv_scores.mean():.4f} (+/- {cv_scores.std():.4f})\n") # Store metrics (avg macro) metrics_data.append({'Model': name, 'Precision': report['macro avg']['precision'], 'Recall': report['macro avg']['recall'], 'F1-Score': report['macro avg']['f1-score']}) # Plotting the confusion matrices plot_confusion_matrices(models, X_train, y_train_encoded, X_test, y_test_encoded, class_names) # Create DataFrame for visualization df_metrics = pd.DataFrame(metrics_data).melt(id_vars=['Model'], var_name='Metric', value_name='Value') # Create bar chart figure plt.figure(figsize=(10, 6)) ax = sns.barplot(x='Metric', y='Value', hue='Model', data=df_metrics, palette='viridis') # Add values to the top of the bars for p in ax.patches: ax.annotate(f'{p.get_height():.2f}', (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='bottom', fontsize=10, fontweight='bold', color='black') # Chart settings plt.ylim(0, 1) plt.title('Comparison of Performance Metrics') plt.ylabel('Value') plt.xlabel('Métric') plt.legend(title='Model') plt.show() ```--DIVIDER--**Random Forest - Classification Report:**

| Class | Precision | Recall | F1-score | Support | |---------------|-----------|--------|----------|---------| | 0 | 0.94 | 0.96 | 0.95 | 291 | | 1 | 0.96 | 0.98 | 0.97 | 313 | | 2 | 0.96 | 1.00 | 0.98 | 650 | | 3 | 0.99 | 1.00 | 1.00 | 113 | | 4 | 0.94 | 0.73 | 0.82 | 193 | | **Accuracy** | | | **0.95** | **1560**| |---------------|-----------|--------|----------|---------| | **Macro Avg** | 0.96 | 0.93 | 0.94 | 1560 | | **Weighted Avg** | 0.95 | 0.95 | 0.95 | 1560 | **Random Forest - Cross Validation Score:** 0.9451 (+/- 0.0047) **Logistic Regression - Classification Report:**

| Class | Precision | Recall | F1-score | Support | |---------------|-----------|--------|----------|---------| | 0 | 0.58 | 0.43 | 0.50 | 291 | | 1 | 0.49 | 0.34 | 0.40 | 313 | | 2 | 0.72 | 0.90 | 0.80 | 650 | | 3 | 0.96 | 0.97 | 0.96 | 113 | | 4 | 0.81 | 0.84 | 0.82 | 193 | | **Accuracy** | | | **0.70** | **1560**| |---------------|-----------|--------|----------|---------| | **Macro Avg** | 0.71 | 0.70 | 0.70 | 1560 | | **Weighted Avg** | 0.68 | 0.70 | 0.68 | 1560 | **Logistic Regression - Cross Validation Score:** 0.6898 (+/- 0.0132) **XGBoost - Classification Report:**

| Class | Precision | Recall | F1-score | Support | |---------------|-----------|--------|----------|---------| | 0 | 0.93 | 0.97 | 0.95 | 291 | | 1 | 0.96 | 0.97 | 0.96 | 313 | | 2 | 0.97 | 0.99 | 0.98 | 650 | | 3 | 0.99 | 1.00 | 1.00 | 113 | | 4 | 0.89 | 0.74 | 0.80 | 193 | | **Accuracy** | | | **0.95** | **1560**| |---------------|-----------|--------|----------|---------| | **Macro Avg** | 0.95 | 0.93 | 0.94 | 1560 | | **Weighted Avg** | 0.95 | 0.95 | 0.95 | 1560 | **XGBoost - Cross Validation Score:** 0.9415 (+/- 0.0053)

![grafico_18.png](grafico_18.png) ![grafico_19.png](grafico_19.png)--DIVIDER--### **2.3 Analysis of Results**

- **Random Forest** and **XGBoost** have almost identical performance, with **95%** accuracy and an average cross-validation score close to **0.95**.

- **XGBoost** has a slight advantage in `f1-score` for class 4 (`Overheating`), but loses a little in overall accuracy.

- **Logistic Regression** has significantly lower performance, with 70% accuracy and lower scores in all metrics.

**Random Forest - Prediction Matrix**

- Classes 0, 1, 2 and 3 are well classified (high values on the diagonal).

- Class 4 (Overheating) is the most difficult (16 instances were classified as class 0, 13 as class 1 and 23 as class 2). **Logistic Regression - Prediction Matrix**

Frequent errors between neighboring classes:

- Class 0 is often classified as 1 and 2.

- Class 1 also has high confusion with 2.

- Class 3 has excellent performance (almost 100% hits).

- Class 4 performs better than Random Forest, but the model fails a lot in other classes.

**XGBoost - Prediction Matrix**

- Classes 0, 1, 2 and 3 have very high performance.

- Class 4 has fewer errors compared to Random Forest, but still presents difficulties.

- Less overall confusion compared to Logistic Regression.

**Which model to use?**

1st Place: **Random Forest** or **XGBoost**

Both are very close in performance. **XGBoost** may be a little better for class 4, but **Random Forest** has higher overall accuracy.If inference time is an important factor, **Random Forest** may be faster and more efficient.

3rd Place: **Logistic Regression**

Much inferior to other models. Frequent errors in classes 0, 1 and 2. Class 4 was better than the other models, but this does not compensate for the overall poor performance.--DIVIDER--## **3. Creating AI Agents for Fault Prediction and Motor Control**

In this section, we simulate the development of AI agents for fault prediction and motor control. Using machine learning techniques, we model a system capable of detecting potential failures based on sensor data and simulating automated decision-making to optimize motor performance.

Although this is not a real-world implementation, the simulation provides valuable insights into how AI can be used to improve predictive maintenance and control strategies in industrial environments.

### **3.1 Set up the Simulation Environment**

To set up the simulation environment, we load the necessary configurations, including the machine learning model and additional tools required for fault prediction. Below is the code snippet that initializes the system:

```python # Load variables from .env file load_dotenv() # Configure LLM llm = ChatOpenAI(model = "gpt-4", temperature = 0, openai_api_key = os.getenv("OPENAI_API_KEY")) # Loading Agent tool: maintenance_guide_tool = DOCXSearchTool(docx='Engine_1.docx') # Load the machine learning model loaded_pipeline = joblib.load("random_forest_motor.pkl") class_names = ['Fuel Issues', 'Mechanical Wear', 'No Failure', 'Oil Leakage', 'Overheating'] predictions = {} # Store predictions from fault_predictor sensor_data_history = [] # Stores sensor data history ``` --DIVIDER--### **3.2 Creating CrewAI Agents for Engine Monitoring and Maintenance**

In this section, we define the key agents responsible for monitoring and predicting engine behavior, as well as providing maintenance recommendations. These agents interact with the simulation environment to ensure engine performance is optimized and potential failures are identified early. Below is the code for creating the agents:

```python # Creating CrewAI Agents real_time_monitor = Agent( role="Real-Time Engine Data Monitor", goal="Continuously collect and visualize engine sensor data in real time.", backstory=''' You are a real-time monitoring agent responsible for tracking engine performance metrics, generating time-series visualizations, and detecting anomalies. Your mission is to ensure that engine parameters remain within safe operational limits. ''', llm=llm, verbose=True ) fault_predictor = Agent( role="Engine Fault Prediction Specialist", goal="Analyze sensor data and predict potential engine failures using machine learning.", backstory=''' You are an AI-driven diagnostic agent trained to identify early signs of engine malfunctions. Using a pre-trained predictive model, you assess real-time data and provide probability-based failure classifications, helping to prevent catastrophic failures.''', llm=llm, verbose=True ) reporting_analyst = Agent( role="Engine Performance Reporting Analyst", goal="Create detailed reports based on engine data, diagnostics, and predictive analysis.", backstory='''You are a meticulous analyst responsible for transforming complex engine data into actionable insights. You generate comprehensive reports that include sensor data trends, predicted failures, and operational recommendations to support decision-making.''', llm=llm, verbose=True ) maintenance_advisor = Agent( role="Engine Maintenance Advisor", goal="Provide recommendations for preventive maintenance and repairs based on sensor data.", backstory='''You are a proactive maintenance agent that leverages real-time monitoring and fault predictions to suggest optimal maintenance schedules. Your goal is to reduce downtime and enhance engine longevity.''', llm=llm, verbose=True, tools=[maintenance_guide_tool] ) ```--DIVIDER--## **3.3 EngineState Class for Engine Behavior Simulation** In this section, the `EngineState` class is defined to simulate the physical behavior of an engine over time, considering the dynamic degradation of its components and the effects of continuous use. To represent this behavior, several critical variables are modeled, such as `temperature`, `oil pressure`, `fuel consumption`, `vibration`, and the general health of the engine. Each of these variables evolves over time according to the engine's dynamics and the degradation of its components. The engine behavior can be described by a set of differential equations modeling the physical variables of interest: ### **3.3.1 Engine Temperature (T(t)):** The engine temperature is influenced by the heat generated by the engine's rotation and load, as well as the cooling efficiency, which tends to degrade over time. The differential equation that describes this evolution is given by: $$\frac{dT(t)}{dt} = \alpha \cdot \left( \frac{\Omega}{K_{1}} \cdot \frac{L}{K_{2}} - C \cdot K_{3} \right)$$ Where: - **$T(t)$**: The engine temperature at time $t$, a time-dependent function. - **$\alpha$**: The thermal dissipation adjustment factor. This factor models the engine's efficiency in dissipating heat, which may depend on various conditions such as engine characteristics, material types, or environmental factors. - **$\Omega$**: The engine speed (revolutions per minute). A higher engine speed generates more heat due to friction and other internal forces. - **$L$**: The load applied to the engine, which also contributes to heat generation. A higher load increases the heat produced by the engine. - **$C$**: The efficiency of the engine's cooling system. A higher value indicates a more efficient cooling system, meaning the engine will cool more effectively, leading to a slower increase in temperature. - **$K_{1}$**: A constant scale factor to convert engine speed from revolutions per minute to a more appropriate unit within the equation. - **$K_{2}$**: A scaling factor for the engine load, adjusted to fit the time units in the equation. - **$K_{3}$**: An adjustment factor related to the cooling system's efficiency, derived from experimental measurements or theoretical considerations of thermal dissipation. #### **Physical Behavior:** This equation aims to capture the physical behavior of engine temperature based on two main factors: - **Heat Generation**: Proportional to the engine speed ($\Omega$) and load ($L$). As the engine speed increases or the load is higher, the engine generates more heat. - **Heat Dissipation**: Heat dissipation is modeled by the cooling efficiency ($C$). When the cooling system is more efficient, the temperature increase due to rotation and load is more moderate. The temperature is constrained within a safe range: $$ T_{min} \leq T(t) \leq T_{max} $$ ### **Relation to the Literature and Empirical Sources:** The presented equation is a simplified representation of the thermal behavior of an engine, commonly used in dynamic system simulations or physical modeling of engines. The general structure of the equation—relating variables such as engine speed, load, and cooling efficiency—is frequently utilized in thermodynamic modeling and thermal control systems. The constant values (e.g., 1000 for $K_{1}$, 20 for $K_{2}$, and 15 for $K_{3}$) are typically based on empirical observations or experimental calibration. Although these constants may not have an exact derivation in the literature, they are tuned to reflect the specific characteristics of a given engine. Some sources that discuss the underlying principles and provide context for these types of empirical parameters include: - **John B. Heywood, *Internal Combustion Engine Fundamentals*** This book provides a comprehensive discussion of engine performance, including the heat generation aspects that relate to engine speed and load. - **Frank P. Incropera & David P. DeWitt, *Fundamentals of Heat and Mass Transfer*** This text covers the principles of heat transfer and dissipation, which help inform the selection of scaling factors in thermal models. - **Yunus A. Çengel & Michael A. Boles, *Thermodynamics: An Engineering Approach*** This work offers insights into thermodynamic principles in engineering systems, including discussions on scaling and calibration of empirical constants. - **Additional Technical Papers and Experimental Studies on Engine Thermal Modeling:** Many research articles and theses in the field of engine modeling discuss the calibration of parameters for predicting thermal behavior. These studies often use experimental data to adjust models similar to the one presented here. In summary, while the constants $K_{1}$, $K_{2}$, and $K_{3}$ are empirically derived and may vary from one engine to another, their selection is supported by the principles outlined in the aforementioned literature. The approach of modeling thermal dissipation using a factor like $\alpha$ is common in both theoretical and experimental studies of engine dynamics, ensuring that the model provides a simplified yet reasonably accurate representation of an engine's thermal behavior. ### **3.3.2 Oil Pressure (P(t))**: The oil pressure depends on the oil viscosity and the engine's RPM. Its equation is modeled as: $$ \frac{dP(t)}{dt} = \beta \cdot \left( \frac{\Omega}{K_{4}} \cdot \mu \right) $$ Where: - **$P(t)$**: The oil pressure at time $t$. - **$\beta$**: An adjustment factor for pressure dissipation. This factor accounts for how effectively pressure fluctuations are dampened by the engine's lubrication system. - **$\Omega$**: The engine speed (revolutions per minute). A higher RPM generally increases the oil pressure due to faster movement of the oil. - **$\mu$**: A measure of the oil's resistance to flow. A higher viscosity typically results in higher oil pressure. - **$K_{4}$**: An empirical scaling factor that converts the RPM value into a unit suitable for the equation. The pressure is also restricted to a safe range: $$ P_{min} \leq P(t) \leq P_{max} $$ This equation is a simplified representation of oil pressure dynamics in an engine, similar to models used in lubrication system studies and engine simulations. The structure—linking RPM and oil viscosity to oil pressure—is common in thermodynamic and mechanical system analyses. The constant value (e.g., 500) and the factor $\beta$ are typically derived from empirical observations or experimental calibrations. Although these values may not appear verbatim in the literature, they are adjusted based on practical testing and data from engine performance studies. Some sources that provide background on these empirical approaches include: - **John B. Heywood, *Internal Combustion Engine Fundamentals*** Offers insights into engine performance parameters, including lubrication and pressure generation. - **Frank P. Incropera & David P. DeWitt, *Fundamentals of Heat and Mass Transfer*** Discusses heat and mass transfer principles which underpin many empirical scaling factors used in engine modeling. - **Yunus A. Çengel & Michael A. Boles, *Thermodynamics: An Engineering Approach*** Provides an understanding of how empirical constants are used in thermal and mechanical system models. - **Research articles on engine lubrication and pressure dynamics:** Numerous studies and technical papers detail the calibration of parameters like $\beta$ and the scaling factors for converting RPM measurements, validating the empirical nature of such models. In summary, while the constants and factors in this equation are empirically derived, they are rooted in the principles discussed in the above literature, ensuring that the model captures the essential behavior of the engine's oil pressure dynamics. ### **3.3.3 Fuel Consumption ($C_{o}(t)$):** Fuel consumption is influenced by the engine's RPM, load, and the health of the fuel injector. These factors interact to determine the amount of fuel consumed by the engine over time. The equation used to model fuel consumption is given by: $$ C_{o}(t) = \left( \frac{\Omega}{K_{5}} \cdot L \right) \cdot \left(1 + \left(1 - H\right)\right) $$ Where: - **$C_{o}(t)$**: The fuel consumption at time $t$. This represents the quantity of fuel consumed per unit time, typically measured in liters per hour (L/h). - **$\Omega$**: The engine speed in revolutions per minute (RPM). Higher engine speeds generally lead to increased fuel consumption due to higher internal friction and greater energy demand. - **$K_{5}$**: A constant scaling factor used to convert engine speed to the appropriate unit for the model. The value of $K_{1}$ is determined based on the specific characteristics of the engine and the units used for measuring consumption. - **$L$**: The load applied to the engine. This reflects the amount of work the engine is performing at any given moment (e.g., the weight of a vehicle or the force required to move a load). Higher loads necessitate more fuel to sustain engine performance. - **$H$**: The health of the fuel injector, expressed as a value between 0 and 1. A value closer to 1 indicates a healthy, efficient injector, whereas a value closer to 0 indicates poor injector health, which leads to less efficient fuel injection and increased fuel consumption. - **Engine Speed and Load Interaction:** The term $(\frac{\Omega}{K_1} \cdot L)$ represents the baseline fuel consumption determined by the engine's operational conditions. As both the engine speed ($\Omega$) and the load ($L$) increase, the baseline consumption increases proportionally. - **Adjustment for Injector Health:** The multiplier $(\left(1 + \left(1 - H\right)\right))$ adjusts the baseline consumption to account for the efficiency of the fuel injector. - When the fuel injector is in optimal condition $(H = 1)$, the multiplier becomes $(1 + (1 - 1) = 1)$, meaning there is no additional consumption penalty. - If the fuel injector is partially degraded $(H < 1)$, the term $(1 - H)$ becomes positive, which increases the overall fuel consumption proportionally to the degree of degradation. - In the extreme case where the injector is completely faulty $(H = 0)$, the multiplier becomes $(1 + 1 = 2)$, effectively doubling the fuel consumption relative to the baseline. **Additional Considerations:** - **Impact of Injector Wear:** Over time, fuel injectors may deteriorate due to contaminants, corrosion, or wear, leading to reduced injection efficiency. The quality and type of fuel used can also affect injector health, further influencing fuel consumption. - **Other Influencing Factors:** Although this equation focuses on engine speed, load, and injector health, other variables—such as engine temperature, driving conditions (e.g., acceleration, braking), and environmental factors (e.g., altitude, ambient temperature)—can also impact fuel consumption. This model is a simplification that captures the primary influences but can be expanded to include additional factors if needed. **Literature References:** - **Heywood, J. B. (1988). *Internal Combustion Engine Fundamentals*. McGraw-Hill Education.** Provides a comprehensive discussion of engine performance, including the effects of engine speed and load on fuel consumption. - **Incropera, F. P., & DeWitt, D. P. (2002). *Fundamentals of Heat and Mass Transfer*. Wiley.** Offers foundational knowledge on heat transfer principles, which underpin many empirical scaling factors used in engine models. - **Çengel, Y. A., & Boles, M. A. (2015). *Thermodynamics: An Engineering Approach*. McGraw-Hill Education.** Explores thermodynamic principles relevant to engine performance and the calibration of empirical constants in fuel consumption models. - **Technical Papers on Engine Fuel Consumption Modeling:** Numerous research articles and studies provide experimental support for calibrating constants like $K_{5}$ and understanding the impact of injector efficiency ($H$) on fuel consumption. This detailed model offers a simplified yet practical approach to quantifying fuel consumption, highlighting how engine speed, load, and injector health collectively influence fuel efficiency. ### **3.3.4 Vibration (V(t)):** The vibration level of an engine is affected by both the engine's RPM and mechanical wear. It can be modeled by the equation: $$ V(t) = \left( \frac{\Omega}{K_{6}} \right) \cdot \left(1 - \left( \mu \cdot K_{7} + C \cdot K_{8} \right) \right) $$ Where: - **$V(t)$**: The vibration level at time $t$. This metric often indicates the mechanical stability of the engine and can be related to wear or imbalance. - **$\Omega$**: The engine speed (in revolutions per minute, RPM). Higher engine speeds typically lead to greater vibrations due to increased dynamic forces and imbalances. - **$\mu$**: The oil viscosity, representing the lubricating properties of the engine oil. Higher viscosity generally improves the damping of mechanical vibrations. - **$C$**: The cooling efficiency of the engine’s cooling system. An efficient cooling system helps maintain optimal operating temperatures and reduces thermal expansion-induced imbalances, thus mitigating vibrations. - **$K_{6}$**: A scaling constant used to normalize engine speed within the model. - **$K_{7}$** and **$K_{8}$**: Empirical coefficients that quantify the relative contributions of oil viscosity and cooling efficiency, respectively, in reducing vibration. **Detailed Explanation:** - **Engine Speed ($\Omega$)**: The term $(\frac{\Omega}{K_{8}})$ normalizes the engine speed. As $\Omega$ increases, so does the vibrational energy due to the enhanced dynamic forces acting on engine components. - **Damping Effects of Lubrication and Cooling**: The expression $(1 - \left( \mu \cdot 0.3 + C \cdot 0.2 \right))$ accounts for the damping provided by the engine’s oil and cooling systems: - **Oil Viscosity ($\mu$)**: A higher viscosity forms a thicker lubricating film between moving parts, reducing friction and damping vibrations. The coefficient $K_{7}$ is empirically determined to reflect this contribution. - **Cooling Efficiency ($C$)**: Efficient cooling prevents excessive thermal expansion and uneven wear, which can lead to vibrations. The coefficient $K_{8}$ represents the relative impact of cooling on vibration reduction. - **Interplay of Factors**: The overall vibration $V(t)$ is modeled as the balance between the vibrational forces (driven by engine speed) and the mitigating influences of effective lubrication and cooling. The equation assumes an additive damping effect from oil viscosity and cooling efficiency. **Additional Considerations:** - **Mechanical Wear and Imbalance**: While the model captures primary influences, other factors—such as structural imbalances, wear of rotating components, and resonance phenomena—can also contribute to overall vibration levels. - **Empirical Nature of the Model**: The coefficients ($K_{7}$ and $K_{8}$) and the scaling factor (3000) are determined through experimental calibration. They are adjusted to reflect the typical operating conditions and the relative importance of lubrication and cooling in damping vibrations. **Literature References:** - **Heywood, J. B. (1988). _Internal Combustion Engine Fundamentals_. McGraw-Hill Education.** Provides insights into engine dynamics, including factors that affect vibration and mechanical stability. - **Incropera, F. P., & DeWitt, D. P. (2002). _Fundamentals of Heat and Mass Transfer_. Wiley.** Discusses thermal effects in engines, which indirectly influence vibration through cooling efficiency. - **Çengel, Y. A., & Boles, M. A. (2015). _Thermodynamics: An Engineering Approach_. McGraw-Hill Education.** Offers a detailed treatment of empirical modeling and the calibration of constants in engineering systems, relevant to vibration and damping analyses. - **Rao, S. S. (2007). _Mechanical Vibrations_. Prentice Hall.** Provides a comprehensive examination of vibration theory and damping mechanisms in mechanical systems, helping to understand the impact of lubrication and cooling on engine vibrations. ### **3.3.6 Component Degradation** Each engine component naturally degrades over time as a result of operating stresses, thermal cycling, and wear. The following differential equations represent simplified models for the degradation of key components: **Cooling System Wear** $$ \frac{d(C)}{dt} = -k_{cool} \cdot \left(1 + \frac{T(t) - K_{9}}{K_{10}} \right) $$ - **$C$**: Represents the cooling efficiency of the engine. - **$k_{cool}$**: The base cooling degradation rate (an empirically determined constant). - **$T(t)$**: The engine temperature at time $t$. - The term $(\frac{T(t) - K_{9}}{K_{10}})$ introduces a temperature-dependent factor, implying that higher operating temperatures accelerate the degradation of the cooling system. This is consistent with the understanding that thermal stress can deteriorate cooling performance over time. **Oil Viscosity Wear** $$ \frac{d(\mu)}{dt} = -k_{oil} \cdot \left(1 + \frac{t_{run}}{5000} \right) $$ - **$\mu$**: The oil viscosity, which is critical for proper lubrication. - **$k_{oil}$**: The base oil degradation rate, which is determined empirically. - **$t_{run}$**: The running period or total operating time of the engine. - This equation suggests that as the engine runs longer, oil degrades progressively (through oxidation, contamination, and thermal breakdown), reducing its viscosity. The additional factor $(\frac{t_{run}}{5000})$ scales the degradation with the operating time. **Fuel Injector Wear** $$ \frac{d(H)}{dt} = -k_{inj} \cdot \left(1 + \frac{\Omega - 3000}{2000} \right) $$ - **$H$**: The health of the fuel injector, ranging from 0 (completely degraded) to 1 (perfect health). - **$k_{inj}$**: The base degradation rate for the fuel injector. - **$\Omega$**: The engine speed (RPM). - In this model, when the engine speed exceeds 3000 RPM, the term $(\frac{\Omega - 3000}{2000})$ becomes positive, indicating that high RPMs accelerate the wear of the fuel injector. This is in line with observations that sustained high engine speeds can lead to increased wear due to mechanical stress and fuel quality issues. ### **3.3.6 High Demand Mode (Boost)** In high demand or boost mode, the engine experiences transient conditions where both load and RPM are increased gradually, but only up to a safe operational limit. The dynamic adjustments are modeled by the following equations: **Engine Load Increase** $$ \frac{d(L)}{dt} = \alpha_{boost} \cdot (1.05) $$ - **$L$**: The engine load. - **$\alpha_{boost}$**: A boost factor representing the rate of increase in load during high demand mode. - The multiplier 1.05 indicates a gradual 5% increase per unit time until the engine reaches a predetermined safe load limit. **Engine RPM Increase** $$ \frac{d(\Omega)}{dt} = \beta_{boost} \cdot (1.03) $$ - **$\Omega$**: The engine speed (RPM). - **$\beta_{boost}$**: A boost factor representing the rate of increase in RPM during high demand mode. - The multiplier 1.03 represents a gradual 3% increase per unit time, ensuring that the engine speed remains within safe operational limits. **Literature References and Empirical Basis** The degradation models and boost mode equations presented above are simplifications based on experimental and empirical observations found in the literature. Some key references that provide theoretical and practical support for these types of models include: - **Heywood, J. B. (1988). _Internal Combustion Engine Fundamentals_. McGraw-Hill Education.** This foundational text covers the principles of engine operation, including how temperature, load, and RPM affect various engine components. - **Incropera, F. P., & DeWitt, D. P. (2002). _Fundamentals of Heat and Mass Transfer_. Wiley.** The principles discussed in this book help explain the thermal effects on material properties and the degradation of systems such as engine cooling. - **Çengel, Y. A., & Boles, M. A. (2015). _Thermodynamics: An Engineering Approach_. McGraw-Hill Education.** Provides insights into the use of empirical constants and scaling factors in modeling engineering systems, which is directly applicable to the calibration of the degradation equations. - **Rao, S. S. (2007). _Mechanical Vibrations_. Prentice Hall.** Although primarily focused on vibrations, this text also touches on how operational stresses contribute to wear and degradation, supporting the overall approach to empirical modeling. These references support the use of empirical constants such as $(k_{cool})$, $(k_{oil})$, and $(k_{inj})$, and they provide context for the scaling factors used in the degradation models. While the exact numerical values of the constants (e.g., 5000 in the oil viscosity equation or 2000 in the fuel injector wear equation) are often derived from experimental calibration for a specific engine, the underlying principles are well-grounded in the literature. **Random High Demand Cycles:** The high demand mode occurs randomly with a certain probability in each cycle. This can be modeled by a stochastic process, such as a Bernoulli distribution, to determine if the engine enters high demand mode in each cycle. ```python # Class to manage engine state class EngineState: def __init__(self): # Operational Parameters self.engine_temperature = 115 self.oil_pressure = 30.0 self.fuel_consumption = 15.0 self.vibration_level = 0.2 self.rpm = 1500 self.engine_load = 65 self.running_period = 100 self.status = "operational" self.maintenance_actions = [] self.high_power_demand = False self.demand_duration = 0 self.original_params = {} # To store original settings self.max_safe_load = 85 # Maximum safe load allowed # Degradation Parameters (Dynamic) self.cooling_efficiency = 0.80 # Degrades with temperature and time self.oil_viscosity = 0.85 # Degrades with temperature and contamination self.fuel_injector_health = 0.75 # Degrades with use and bad fuel # Base Degradation Factors self.base_cooling_degradation = 0.0001 # Degradation per cycle self.base_oil_degradation = 0.0002 self.base_injector_degradation = 0.0005 # New parameters for high demand management self.high_demand_probability = 0.1 # 10% chance to trigger high demand every cycle self.min_high_demand_duration = 5 # Minimum duration of high demand mode (cycles) self.max_high_demand_duration = 20 # Maximum duration of high demand mode (cycles) self.performance_target = 1.0 # 100% normal performance self.safe_performance_boost = 1.5 # 50% maximum increase allowed self.boost_duration = 0 # Remaining high demand cycles self.original_performance_params = {} # Original parameters before boos def update_physics(self): """Updates engine variables with dynamic component degradation""" # 1. Progressive Degradation of Components self._degradate_components() # 2. Calculation of Operational Variables (based on degraded parameters) self._calculate_temperature() self._calculate_oil_pressure() self._calculate_fuel_consumption() self._calculate_vibration() # 3. Update operating time self.running_period += 1 def activate_high_power_mode(self, duration_cycles: int): """Activate high performance mode safely""" if not self.high_power_demand: # Save original configuration self.original_performance_params = { 'rpm': self.rpm, 'engine_load': self.engine_load, 'fuel_consumption': self.fuel_consumption } # Configure demand self.high_power_demand = True self.boost_duration = duration_cycles self.performance_target = min( self.safe_performance_boost, self.original_performance_params.get('engine_load', 65) * 1.5 / self.max_safe_load ) print(f"🚀 High Performance Mode Activated: Target {self.performance_target*100:.0f}% by {duration_cycles} cycles") def check_random_high_demand(self): """Checks whether the engine should enter high demand mode randomly""" if not self.high_power_demand: #Only activate if it is not already in high demand if random.random() < self.high_demand_probability: duration = random.randint(self.min_high_demand_duration, self.max_high_demand_duration) self.activate_high_power_mode(duration) def _degradate_components(self): """Models dynamic component degradation""" # Degradation of the Cooling System (worse with high temperature) temp_factor = max(0, (self.engine_temperature - 100) / 50) self.cooling_efficiency -= self.base_cooling_degradation * (1 + temp_factor) self.cooling_efficiency = np.clip(self.cooling_efficiency, 0.4, 0.95) # Physical limits # Degradation of Oil Viscosity (worse with temperature and contamination) oil_contamination = self.running_period / 5000 # Progressive accumulation self.oil_viscosity -= self.base_oil_degradation * (1 + oil_contamination) self.oil_viscosity = np.clip(self.oil_viscosity, 0.5, 1.2) # Standardized viscosity # Injector degradation (worse with high RPM and low quality fuel) rpm_factor = max(0, (self.rpm - 3000) / 2000) # Factor 0-1 for RPM > 3000 self.fuel_injector_health -= self.base_injector_degradation * (1 + rpm_factor) self.fuel_injector_health = np.clip(self.fuel_injector_health, 0.3, 1.0) def _calculate_temperature(self): """Calculates temperature with degraded cooling efficiency""" base_heat = (self.rpm / 1000) * (self.engine_load / 20) cooling = self.cooling_efficiency * 15 # Degraded cooling capacity self.engine_temperature += (base_heat - cooling) * 0.1 + np.random.uniform(-0.5, 0.5) self.engine_temperature = np.clip(self.engine_temperature, 80, 150) # Safe Limits def _calculate_oil_pressure(self): """Calculates oil pressure with degraded viscosity""" self.oil_pressure = (self.rpm / 500) * self.oil_viscosity + np.random.uniform(-0.2, 0.2) self.oil_pressure = np.clip(self.oil_pressure, 20, 60) def _calculate_fuel_consumption(self): """Calculates fuel consumption with degraded injectors""" base_consumption = (self.rpm * self.engine_load) / 1250 efficiency_loss = 1 + (1 - self.fuel_injector_health) # Até 30% de perda self.fuel_consumption = base_consumption * efficiency_loss + np.random.uniform(-0.2, 0.2) def _calculate_vibration(self): """Calculates vibration with general wear""" mechanical_wear = 1 - (self.oil_viscosity * 0.3 + self.cooling_efficiency * 0.2) # Combined factor self.vibration_level = (self.rpm / 3000) * mechanical_wear + np.random.uniform(-0.01, 0.01) def update_high_power_physics(self): """Specific adjustments for high demand operation""" if self.high_power_demand and self.boost_duration > 0: # Calculate difference to target load_ratio = self.engine_load / self.max_safe_load rpm_ratio = self.rpm / 3000 # Maximum safe RPM # Progressive parameter adjustment if load_ratio < self.performance_target: self.engine_load = min( self.max_safe_load, self.engine_load * 1.05 # Gradual increase of 5% per cycle ) if rpm_ratio < self.performance_target: self.rpm = min( 3000, # Maximum safe RPM self.rpm * 1.03 # Gradual increase of 3% per cycle ) self.boost_duration -= 1 if self.boost_duration <= 0: self._restore_normal_operation() def _restore_normal_operation(self): """Returns to normal operation gradually""" print("🔁 Restoring normal operation...") self.high_power_demand = False self.rpm = self.original_performance_params['rpm'] self.engine_load = self.original_performance_params['engine_load'] self.performance_target = 1.0 def get_state_description(self): """Returns a textual description of the engine state to agents""" return f""" Current Engine Status: - Temperature: {self.engine_temperature:.1f}°C - Oil Pressure: {self.oil_pressure:.1f} psi - Engine Load: {self.engine_load:.1f}% - RPM: {self.rpm:.0f} - Status: {self.status} - Cooling Efficiency: {self.cooling_efficiency * 100:.1f}% - Oil Viscosity: {self.oil_viscosity:.2f} - Injector Health: {self.fuel_injector_health * 100:.1f}% """ def generate_sensor_data(self): """Generates sensor data with controlled variation""" self.update_physics() # Update physical parameters first return { "engine_temp": np.clip(self.engine_temperature + np.random.uniform(-2, 2), 80, 150), "running_period": self.running_period, "oil_pressure": np.clip(self.oil_pressure + np.random.uniform(-0.5, 0.5), 20, 60), "fuel_consumption": np.clip(self.fuel_consumption + np.random.uniform(-0.3, 0.3), 5, 30), "vibration_level": np.clip(self.vibration_level + np.random.uniform(-0.02, 0.02), 0, 1), "rpm": int(np.clip(self.rpm + np.random.uniform(-50, 50), 800, 3000)), "engine_load": np.clip(self.engine_load + np.random.uniform(-2, 2), 0, 100), "timestamp": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') } ```--DIVIDER--### **3.4 Predicting Engine Failures with Collected Sensor Data**

To predict failures based on the collected data, we use the machine learning model that was loaded earlier. The `predict_failure` method takes the sensor data generated by the `EngineState` class and applies the model to predict the engine failure.

```python def predict_failure(sensor_data: dict): """ Method for Predicting Engine Failures Based on Sensor Data. """ registro_coleta = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') sample_input_df = pd.DataFrame([sensor_data]) probabilities = loaded_pipeline.predict_proba(sample_input_df) predicted_class = class_names[np.argmax(probabilities)] # Salvar previsão predictions[registro_coleta] = { "sensor_data": sensor_data, "predicted_class": predicted_class, "probabilities": probabilities[0].tolist() } print(f"Predicted class: {predicted_class}") return predicted_class ``` With this structure, the system simulates the degradation of engine components and uses the collected data to predict failures, helping to create a more efficient environment for predictive maintenance and engine performance control.--DIVIDER--### **3.5 Analyzing Sensor Data Trends for Predictive Maintenance**

In this section, we introduce a method for analyzing trends in the collected sensor data. The `analyze_trends` function performs statistical analysis and trend detection on the historical sensor data, providing insights into engine performance. This method is valuable for identifying patterns and anomalies that could signal potential issues, helping to guide predictive maintenance actions. Here’s the implementation of the `analyze_trends` function: ```python def analyze_trends(sensor_data_history): """ Performs statistical and trend analysis on sensor data. """ print(sensor_data_history) df = pd.DataFrame(sensor_data_history) analysis = { "max_temperature": df["engine_temp"].max(), "mean_temperature": df["engine_temp"].mean(), "min_temperature": df["engine_temp"].min(), "max_oil_pressure": df["oil_pressure"].max(), "mean_oil_pressure": df["oil_pressure"].mean(), "min_oil_pressure": df["oil_pressure"].min(), "fuel_consumption_trend": np.polyfit(range(len(df)), df["fuel_consumption"], 1)[0], "vibration_anomalies": df[df["vibration_level"] > 0.3].shape[0] } return analysis ``` This function processes historical sensor data to extract key metrics, including the mean engine temperature, maximum oil pressure, fuel consumption trend, and vibration anomalies. The results can be used to predict when maintenance or corrective actions may be needed based on trends and detected irregularities.--DIVIDER--### **3.6 Generating Visualizations for Sensor Data Trends**

In this section, we implement a function to generate individual plots for each key variable from the historical sensor data. The `generate_plots` function creates visual representations of engine performance trends over time, helping to identify potential issues visually. The generated plots will be saved as images for further analysis and reporting.

Here’s the implementation of the generate_plots function: ```python def generate_plots(sensor_data_history): """ Generates individual graphs for each variable from the sensor data history. """ timestamps = [data["timestamp"] for data in sensor_data_history] metrics = { "engine_temp": [data["engine_temp"] for data in sensor_data_history], "oil_pressure": [data["oil_pressure"] for data in sensor_data_history], "fuel_consumption": [data["fuel_consumption"] for data in sensor_data_history], "vibration_level": [data["vibration_level"] for data in sensor_data_history], "rpm": [data["rpm"] for data in sensor_data_history], "engine_load": [data["engine_load"] for data in sensor_data_history] } for metric, values in metrics.items(): plt.figure(figsize=(12, 8)) plt.plot(timestamps, values, label=metric) plt.xlabel("Timestamp") plt.ylabel("Value") plt.title(f"Trend of {metric} Over Time") plt.legend() plt.xticks(rotation=45) plt.tight_layout() plt.savefig(f"{metric}_trend.png") plt.close() ``` This function processes the historical sensor data and generates a plot for each of the key engine parameters: `engine temperature`, `oil pressure`, `fuel consumption`, `vibration level`, `RPM`, and `engine load`. The plots help visualize trends over time and can be used to detect deviations from normal operating conditions, enabling more proactive maintenance and troubleshooting actions.--DIVIDER--### **3.7 Generating a Comprehensive Markdown Report**

This section describes the implementation of a function designed to generate a comprehensive Markdown report summarizing engine performance. The report includes trend analysis, predictions, maintenance actions, and sensor data visualizations. It organizes the information chronologically, presenting key metrics and actions to facilitate decision-making for predictive maintenance and performance optimization.

Here’s the code for generating the Markdown report:

```python def generate_markdown_report(predictions, trend_analysis, maintenance_actions, performance_overview): """ Generates a report in Markdown format with the results of predictions, analyzes and graphs. Organizes information chronologically, grouping actions, predictions and sensor data. """ report = "# Engine Performance Report\n\n" # Add trend analysis report += "## Trend Analysis\n" report += f"- Maximum Temperature: {trend_analysis['max_temperature']:.2f}°C\n" report += f"- Average Temperature: {trend_analysis['mean_temperature']:.2f}°C\n" report += f"- Minimum Temperature: {trend_analysis['min_temperature']:.2f}°C\n" report += f"- Maximum Oil Pressure: {trend_analysis['max_oil_pressure']:.2f} psi\n" report += f"- Average Oil Pressure: {trend_analysis['mean_oil_pressure']:.2f} psi\n" report += f"- Minimum pressure: {trend_analysis['min_oil_pressure']:.2f} psi\n" report += f"- Fuel Consumption Trend: {trend_analysis['fuel_consumption_trend']:.2f}\n" report += f"- Vibration Anomalies: {trend_analysis['vibration_anomalies']}\n\n" # Add component health report += "## Component Health\n" report += f"- Cooling Efficiency: {engine_state.cooling_efficiency * 100:.1f}%\n" report += f"- Oil Viscosity: {engine_state.oil_viscosity:.2f}\n" report += f"- Injector Health: {engine_state.fuel_injector_health * 100:.1f}%\n\n" # Chronological Section: Actions, Forecasts and Sensor Data report += "## Chronology of Events\n" # Combine predictions and actions into a single list ordered by timestamp events = [] # Add predictions for timestamp, prediction in predictions.items(): events.append({ "type": "prediction", "timestamp": timestamp, "data": prediction }) # Add actions for action in maintenance_actions: events.append({ "type": "action", "timestamp": action["timestamp"], "data": action }) # Sort events by timestamp events.sort(key=lambda x: x["timestamp"]) # Iterate over the ordered events for event in events: if event["type"] == "prediction": prediction = event["data"] report += f"### {event['timestamp']} - Failure Prediction\n" report += f"- **Expected Class:** {prediction['predicted_class']}\n" # Create probability table by class report += "- **Probabilities per Failure Module:**\n\n" report += "|Failure Class | Probability |\n" report += "|----------------------|---------------|\n" for class_name, prob in zip(class_names, prediction['probabilities']): report += f"| {class_name.ljust(20)} | {f'{prob:.2f}'.rjust(13)} |\n" report += "\n" # Add a blank line after the table report += f"- **Sensor Data:** {json.dumps(prediction['sensor_data'], indent=2)}\n\n" elif event["type"] == "action": action = event["data"] report += f"### {event['timestamp']} - Corrective Action\n" report += f"- **Action:** `{action['action']}`\n" report += f"- **Reason:** {action['reason']}\n\n" # Performance Analysis Section by Advisor report += "## Performance Analysis by the Expert\n" report += f"{performance_overview}\n\n" # In the maintenance actions section, add: if "guide_reference" in action: report += f"- **Manual Reference:** {action['guide_reference']}\n" # Add individual charts report += "## Trend Charts\n" metrics = ["engine_temp", "oil_pressure", "fuel_consumption", "vibration_level", "rpm", "engine_load"] for metric in metrics: report += f"### {metric.replace('_', ' ').title()}\n" report += f"![Trend of {metric.replace('_', ' ')}]({metric}_trend.png)\n\n" # Save the report to a Markdown file with open("performance_report.md", "w", encoding="utf-8") as file: file.write(report) return report ``` This function generates a detailed Markdown report that includes:

- **Trend Analysis:** Key metrics like engine temperature, oil pressure, fuel consumption trends, and vibration anomalies.

- **Component Health:** Information on cooling efficiency, oil viscosity, and fuel injector health.

- **Event Chronology:** A timeline of engine failure predictions and maintenance actions, including sensor data and failure probabilities.

- **Performance Overview:** A section dedicated to insights from the performance expert.

- **Trend Graphs:** Visualizations of sensor data trends over time.

This structured report provides a comprehensive view of the engine's health and operational status, supporting proactive decision-making in predictive maintenance.--DIVIDER--### **3.8 Performance Overview Generation for Predictive Maintenance**

This section focuses on generating a performance overview of the engine, leveraging the engine state information and engaging the maintenance_advisor to provide a detailed report. The analysis is based on the engine's operational history, efficiency, and key metrics such as oil viscosity and injector health. The resulting performance overview includes recommendations for future maintenance actions and a life expectancy outlook. Here’s the code for generating the performance overview:

```python def generate_performance_overview(engine_state: EngineState): """Generates a general performance analysis using the maintenance advisor""" analysis_context = f""" Context for Analysis: - Operating Hours: {engine_state.running_period} - Last Status: {engine_state.status} - Cooling Efficiency: {engine_state.cooling_efficiency * 100:.1f}% - Oil Viscosity: {engine_state.oil_viscosity:.2f} - Injector Health: {engine_state.fuel_injector_health * 100:.1f}% - Stock History: {len(engine_state.maintenance_actions)} interventions """ overview_task = Task( description=f'''Analyze the engine’s overall performance considering: {analysis_context} Provide: 1. Assessment of general condition 2. Main points of attention 3. Recommendations for preventive maintenance 4. Remaining useful life perspective 5. **Classification of problem severity and prioritization of corrective actions** 6. **Analysis of efficiency, including:** - Impact on fuel consumption - Thermal efficiency and potential waste 7. **Suggestions for optimization, such as:** - Operational adjustments to reduce wear - Strategies to extend component lifespan ''', agent=maintenance_advisor, expected_output='''Report structured with: - Executive summary - Main risks identified - Recommended action plan - Prioritized list of issues and corrective actions - Efficiency insights and operational optimization suggestions ''' ) overview_crew = Crew( agents=[maintenance_advisor], tasks=[overview_task], verbose=False ) return overview_crew.kickoff() ``` **3.8.1 Analysis Context:**

- The context for the analysis is constructed using the engine_state object, which contains crucial data such as running period, status, cooling efficiency, oil viscosity, fuel injector health, and the maintenance action history. **3.8.2 Task Creation:**

- A task is created to analyze the engine’s overall performance using the maintenance_advisor. The task description specifies the analysis context and outlines the expected output, including a structured report containing an executive summary, key risks, and a recommended action plan. **3.8.3 Crew Setup:**

- A Crew is set up to handle the task execution, with the maintenance_advisor acting as the agent. The task is assigned to the crew, and the execution is initiated via kickoff(). Expected Output: The task will return a detailed performance overview that includes a summary of the engine's current state, potential risks, maintenance recommendations, and an estimate of the remaining operational life--DIVIDER--### **3.9 Creating Tasks for Action Decisions and Failure Predictions**

In this section, we create several tasks that form the backbone of the predictive maintenance system. These tasks involve real-time data collection, failure predictions, reporting and maintenance recommendations. Each task is configured with an agent (such as `maintenance_advisor` or `fault_predictor`), which performs specific functions, such as predicting failures or generating action plans based on sensory data.

**3.9.1. Action Decision Task & Application of the Agent's Decision:**

The `create_action_decision_task` function is responsible for generating a task that evaluates the current situation of the engine, taking into account the high demand context (if applicable), the failure prediction, the engine state and defined decision criteria. The task asks `maintenance_advisor` for a recommended action, such as reducing load, adjusting RPM, or shutting down the engine. The agent returns a decision, which is then applied to the engine state. After the decision is generated, the `apply_agent_decision` function applies the recommended action to the engine state. This includes adjusting the RPM, reducing the load, or even turning off the engine, depending on the action chosen by the agent. The action performed is also recorded, with the time, reason and reference to the maintenance guide. ```python #Definition of the expected model for the output def create_action_decision_task(prediction_data: dict, engine_state: EngineState): # Additional context for high demand high_demand_context = "" if engine_state.high_power_demand: high_demand_context = f""" ## Active High Demand Mode - Remaining cycles: {engine_state.boost_duration} - Target Performance: {engine_state.performance_target * 100:.0f}% - Current RPM: {engine_state.rpm} / Max Safe: 3000 - Current Load: {engine_state.engine_load}% / Max Safe: {engine_state.max_safe_load}% """ return Task( description=f'''## Technical Context {high_demand_context} **Current Prediction**: {prediction_data['predicted_class']} **Odds**: {json.dumps(prediction_data['probabilities'], indent=2)} **Engine Status**: {engine_state.get_state_description()} ## Decision Criteria 1. Temperature must remain between 20°C and 120°C 2. Oil pressure between 25psi and 40psi 3. Engine load should not exceed 80% ## Allowed Actions: - "reduce_load": Reduce load by 30% - "shutdown": Emergency shutdown - "adjust_rpm": Adjust RPM by ±100 - "cool_down": Reduce RPM for cooling - "monitor": Monitor only - "maintain_boost": Maintain high demand (only in high demand mode) ## Response Format: Return the response in the format: Action: Reason: ''', agent=maintenance_advisor, expected_output='Response in the format "Action: \nReason: "', callback=lambda output: apply_agent_decision(output, engine_state) ) def apply_agent_decision(output, engine_state: EngineState): try: # Check critical conditions and consult the maintenance guide if engine_state.engine_temperature > 130: emergency_guidance = maintenance_guide_tool.search( query="Emergency procedures for overheating conditions" ) print(f"⚠️ GUIDANCE: {emergency_guidance[:200]}...") # Add guidance to reason for action output = f"Action: shutdown\nReason: Critical temperature detected. {emergency_guidance[:200]}..." # Ensure output is a string if not isinstance(output, str): output = str(output) # Convert to string # Extract action, reference and reason action = "Monitor" # Default value reference = "No references provided." reason = "No explanation provided." # Use regex to extract the action, reference and reason action_match = re.search(r"Action:\s*(.+)", output) reference_match = re.search(r"Reference:\s*(.+)", output) reason_match = re.search(r"Reason:\s*(.+)", output) if action_match: action = action_match.group(1).strip().lower() if reference_match: reference = reference_match.group(1).strip() if reason_match: reason = reason_match.group(1).strip() print(f"\n🔧 Action Decided: {action.upper()}") print(f"📖 Reference: {reference}") print(f"📝 Reason: {reason}") # Apply action with realistic physical effects if action == "reduce_load": # Load reduction affects RPM and consumption engine_state.engine_load = max(20, engine_state.engine_load - 30) engine_state.rpm = max(1000, engine_state.rpm - 200) reason += "Reduced load to avoid overloading." elif action == "shutdown": # Gradual shutdown engine_state.rpm = 0 engine_state.engine_load = 0 engine_state.status = "shutdown" reason += "Engine off for safety." elif action == "cool_down": # Reduce RPM for cooling reduction = min(300, engine_state.rpm - 800) # Reduces to a minimum of 800 RPM engine_state.rpm -= reduction reason += f" Engine cooling, RPM reduced to {engine_state.rpm}." elif action == "adjust_rpm": # RPM adjustment with inertia adjustment = random.choice([-150, 150]) new_rpm = engine_state.rpm + adjustment # Physical limits and relationship with load if new_rpm < 800: new_rpm = 800 reason += " Minimum RPM reached (800 RPM)." elif new_rpm > 2800: new_rpm = 2800 reason += " Maximum RPM reached (2800 RPM)." engine_state.rpm = new_rpm # Automatic load adjustment (cannot be changed directly) engine_state.engine_load = min(100, (new_rpm / 2800) * 100) reason += f" Load automatically adjusted to {engine_state.engine_load:.1f}%." elif action == "Monitor": reason += " No action required. Monitoring the engine." # After applying the action engine_state.maintenance_actions.append({ "timestamp": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'), "action": action, "reference": reference, # Include reference to the manual "reason": reason }) # Update engine physics after action engine_state.update_physics() # Update engine status engine_state.status = f"{action} - {reason[:50]}..." # Limit status size # Check critical limits if engine_state.engine_temperature > 120: print("⚠️ CRITICAL: Excessive temperature! Automatic shutdown.") engine_state.status = "emergency_shutdown" engine_state.rpm = 0 engine_state.engine_load = 0 except Exception as e: print(f"Error applying decision: {str(e)}") real_time_monitoring_task = Task( description='''Continuously collect and visualize engine sensor data in real time. Generate time-series graphs to track key metrics such as engine temperature, oil pressure, fuel consumption, vibration levels, RPM, and engine load.''', agent=real_time_monitor, expected_output=''' - A set of real-time visualizations displaying sensor data trends. - Alerts when engine parameters exceed safe operational limits.''' ) def fault_prediction_task_function(sensor_data): predicted_class = predict_failure(sensor_data) return { "predicted_class": predicted_class, "sensor_data": sensor_data, "timestamp": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') } fault_prediction_task = Task( description='''Analyze real-time sensor data and predict potential engine failures using the trained machine learning model. Identify failure types and provide probability scores for each classification.''', agent=fault_predictor, expected_output='''- A classification of engine health status with probability scores. - A timestamped log of detected anomalies and potential failures.''', function=fault_prediction_task_function ) reporting_task = Task( description='''Review collected engine data and fault predictions to generate a detailed performance report. The report should include an analysis of engine trends, detected anomalies, and predictive maintenance recommendations.''', agent=reporting_analyst, expected_output=''' - A comprehensive markdown report summarizing engine performance. - Visualizations of sensor data trends. - A summary of predicted failures and their probability.''') maintenance_recommendation_task = Task( description='''Based on real-time monitoring, fault predictions, and the official maintenance guide: **Important!** Always consult the maintenance guide document using the provided tool for: - Standard operating procedures - Manufacturer recommendations - Approved repair techniques - Safety protocols Generate maintenance recommendations that: 1. Combine machine learning predictions with real-time sensor data analysis 2. Include specific references to the maintenance guide (section numbers or procedures) 3. Follow the manufacturer's guidelines for each recommended action **Output Format:** - Action: - Reference: - Reason: ''', agent=maintenance_advisor, expected_output='''- A structured maintenance plan with recommended actions prioritized by criticality - Specific references to maintenance guide sections - Clear reasoning based on data and manufacturer guidelines''' ) ``` **3.9.2. Complementary Tasks:**

In addition to the action decision task, other tasks are created to perform real-time monitoring, failure prediction, and reporting on engine performance:

- **Real-Time Monitoring:** Collects and visualizes data from engine sensors in real time, generating graphs and alerts when settings exceed safe limits.

- **Failure Prediction:** Analyzes sensor data in real time and predicts potential engine failures based on a machine learning model.

- **Performance Report Generation:** Compiles collected data, including detected anomalies, and generates a detailed report with maintenance recommendations.

- **Maintenance Recommendations:** Generates a maintenance plan based on failure predictions and guidelines from the official maintenance guide.

These tasks work together to provide a robust predictive maintenance solution, based on real-time sensory data and advanced predictive analytics.--DIVIDER--### **3.9 Advanced Engine Simulation**

Finally, the `advanced_simulation` function integrates several components to simulate an advanced engine management system, incorporating real-time sensor data, failure predictions, and dynamic decision-making processes. The function iterates through multiple cycles, where it evaluates the engine state, simulates potential high-demand scenarios, and monitors key parameters, such as engine temperature and sensor data. At the beginning of each iteration, the simulation checks if high demand should be triggered, influencing the engine's operational conditions. It then generates sensor data to assess the engine's performance under varying conditions. This data is used for failure prediction, where the predict_failure function is called to analyze the current engine state and predict possible issues. The simulation creates two tasks dynamically: the first task predicts the engine state based on the sensor data, while the second task evaluates the best course of action, considering the most recent failure prediction. These tasks are executed in sequence by the crew, a team of agents responsible for handling various aspects of the simulation, such as real-time monitoring, fault prediction, reporting, and maintenance recommendations. As the simulation progresses, the engine state is continuously updated with new data, including physics-based changes related to the engine's performance under both regular and high-demand conditions. If the engine's temperature exceeds a critical threshold, an emergency shutdown is triggered to prevent damage. Once all tasks are executed, the results are collected, and the simulation proceeds with trend analysis and performance evaluation, including generating reports and visualizations. The overall process highlights the importance of integrating predictive analytics, automated decision-making, and continuous monitoring for effective engine management and maintenance ```python # Integrated Simulation def advanced_simulation(): engine_state = EngineState() for _ in range(18): # Check whether to activate high demand randomly engine_state.check_random_high_demand() # Generate sensor data sensor_data = engine_state.generate_sensor_data() # ✅ Adds sensor data to history sensor_data_history.append(sensor_data) # Failure prediction predicted_class = predict_failure(sensor_data) # Create and execute tasks dynamically prediction_task = Task( description=f'''Analyze sensor data: {json.dumps(sensor_data, indent=2)} Classify the engine state using the loaded model.''', agent=fault_predictor, function=lambda: predict_failure(sensor_data), expected_output="Enter the expected class of the engine condition." ) decision_task = create_action_decision_task(predictions[next(reversed(predictions))], # Latest prediction engine_state) # Continuous update even without actions engine_state.update_physics() engine_state.update_high_power_physics() # Update specific high demand physics # Execute tasks in sequence crew.tasks = [prediction_task, decision_task] crew.kickoff() print(f"\n📊 Current Status: {engine_state.get_state_description()}") # Critical operating limits if engine_state.engine_temperature > 150: print("⚠️ CRITICAL: Excessive temperature! Automatic shutdown.") engine_state.status = "emergency_shutdown" break time.sleep(2) if decision_task.output: # Use 'raw' attribute to get output as string output_text = decision_task.output.raw engine_state.maintenance_actions.append({ "timestamp": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'), "action": output_text.split('\n')[0].replace('Ação: ', ''), "reason": output_text.split('\n')[1].replace('Motivo: ', '') }) return engine_state # Creating the CrewAI Team crew = Crew( agents=[real_time_monitor, fault_predictor, reporting_analyst, maintenance_advisor], tasks=[real_time_monitoring_task, fault_prediction_task, reporting_task, maintenance_recommendation_task], verbose=True ) # Simulation: # Reset states predictions.clear() sensor_data_history.clear() engine_state = advanced_simulation() trend_analysis = analyze_trends(sensor_data_history) # Generate graphs and analysis generate_plots(sensor_data_history) trend_analysis = analyze_trends(sensor_data_history) # Generate performance analysis performance_overview = generate_performance_overview(engine_state) # Run the team to generate the report resultado = crew.kickoff() print("\n📄 Final Report:\n", resultado) print("\n📊 Trend Analysis:\n", trend_analysis) # Generate the report in Markdown report = generate_markdown_report(predictions, trend_analysis, engine_state.maintenance_actions, performance_overview) print("\n📝 Markdown report saved in 'performance_report.md'.") ```--DIVIDER--## **4. Results** To present the results of the agent interactions, a Jupyter Notebook (`Naval_Engines`) is provided, offering a step-by-step analysis of each agent's actions throughout the simulation. Additionally, two reports generated by the agents are included, detailing the interactions performed and the engine's performance analysis during the operational phase. The first report examines an `overheating` failure, describing the corrective actions taken by the agents to resolve the issue and optimize engine performance. 📄 **Performance Report - Overheating:** [View here](https://github.com/EduardoSantosSousa/Smart_Engine_Agents/blob/main/performance_report._Overheating.md) The second report focuses on `mechanical wear`, outlining the preventive measures implemented by the agents to mitigate its effects. 📄 **Performance Report - Mechanical Wear:** [View here](https://github.com/EduardoSantosSousa/Smart_Engine_Agents/blob/main/performance_report_Mechanical_Wear_Condition.md) 🔹 *Note:* Ensure that all trend images are stored in the same folder to display correctly in the reports. --DIVIDER--## **5 Reference** 1. Marques, F., & Brito, M. (2019). *Maintenance Strategies in Marine Engines*. Available at: [https://www.researchgate.net/publication/333567890_Maintenance_Strategies_in_Marine_Engines](https://www.researchgate.net/publication/333567890_Maintenance_Strategies_in_Marine_Engines) 2. Macnica DHW. (2023). *Manutenção Preditiva na Indústria Naval: um novo caminho*. Available at: [https://www.macnicadhw.com.br/blog/manutencao-preditiva-na-industria-naval](https://www.macnicadhw.com.br/blog/manutencao-preditiva-na-industria-naval) 3. Filtrovali. (2019). *Benefícios da Manutenção Preditiva em Motores Marítimos*. Available at: [https://www.filtrovali.com.br/blog/beneficios-da-manutencao-preditiva-em-motores-maritimos](https://www.filtrovali.com.br/blog/beneficios-da-manutencao-preditiva-em-motores-maritimos) 4. Martin, L., & Polanco, A. (2020). *Advanced Maintenance Strategies for Industrial Machinery*. Available at: [https://www.researchgate.net/publication/341759360_Advanced_Maintenance_Strategies_for_Industrial_Machinery](https://www.researchgate.net/publication/341759360_Advanced_Maintenance_Strategies_for_Industrial_Machinery) 5. Hayes, S. (2022). *How IoT and Predictive Maintenance Are Changing Industrial Operations*. Available at: [https://www.manufacturingglobal.com/technology/how-iot-and-predictive-maintenance-are-changing-industrial-operations](https://www.manufacturingglobal.com/technology/how-iot-and-predictive-maintenance-are-changing-industrial-operations)