The final—and most advanced—requirement of the project was to simulate a real-time Industry 5.0 production environment. Static models (like the Random Forest trained in Phase 4b) degrade over time in production due to sensor decalibration, operator fatigue, or process changes—a phenomenon known as Concept Drift.
My goal was to build a streaming architecture that not only learns incrementally (row by row) but also monitors its own error rate to detect when the physical reality on the factory floor changes.
The Streaming Architecture & Prequential Evaluation
Unlike traditional machine learning, which relies on a fixed train/test split, I used the river library to implement Prequential Evaluation (Test-Then-Train).
As each of the 97,612 time-windows arrives:
Simulating the Real World: The Drift Injection
To rigorously test the system, I designed a simulate_stream function. For the first half of the dataset (0 to 48,806 windows), the stream runs normally. However, precisely at the 50% mark, I injected a 30% label noise rate (randomly flipping labels to incorrect classes). This simulated a sudden, severe process failure or sensor malfunction on the production line.
The ADWIN & ARF Response (The "Aha!" Moment) For the modeling and detection engine, I paired an Adaptive Random Forest (ARF) with an ADWIN (Adaptive Windowing) drift detector.
[DRIFT DETECTED]. ADWIN detected a statistically significant spike in the error rate just 58 windows (~3 seconds) after the process was corrupted.Phase 5 Code Analysis:
1. The Incremental Learning Loop
Python
for i, (x, y_true) in enumerate(simulate_stream(feat_df, inject_drift=True)): y_pred = model.predict_one(x) # Test accuracy.update(y_true, y_pred) # Evaluate drift_detector.update(int(y_pred != y_true)) # Monitor Error model.learn_one(x, y_true) # Train
This loop is the beating heart of the system. It processes the entire 10GB dataset equivalent (condensed into features) continuously, one window at a time, while requiring almost zero RAM.
2. Model Definition
Python
model = preprocessing.StandardScaler() | forest.ARFClassifier(n_models=10, seed=42) drift_detector = drift.ADWIN(delta=0.002)
The data is incrementally scaled before hitting the ARF. n_models=10 provided enough ensemble power to handle 15 classes while keeping computation extremely light.