Car Maintenance Dataset
Description:
This dataset is a large-scale, fully synthetic simulation of vehicle component wear and failure patterns over time. It covers 10,000 unique cars across 80 months of operation and tracks 20 critical automotive parts. Each record simulates realistic degradation influenced by driving style, road type, car segment, and environmental conditions, making it ideal for predictive maintenance research, machine learning, and fleet management simulations.
Key Features:
Car Profiles: Includes car segment (sedan, truck_suv, sports_car), driving style (conservative, normal, aggressive), and road exposure (city_mixed, highway_dominant, rough_terrain).
Environmental Context: Models the impact of hot, cold, dusty, and rainy conditions on component wear.
Parts Monitored: Engine oil, brake pads, tires, air filter, battery, timing belt, transmission fluid, AC compressor, alternator, water pump, exhaust system, coolant, fuel pump, fuel filter, oxygen sensor, spark plugs, cabin filter, wiper blades, shock absorber, control arm.
Wear Metrics: Each part includes wear_score, remaining useful life (RUL), and failure flags, with multi-output labels for machine learning tasks.
Advanced Simulation Features:
Mileage spikes and extreme environmental events for edge-case realism.
Correlation effects between parts (e.g., oil stress influencing other components).
Multi-task labels for predicting part failure independently.
Lag features for temporal modeling of wear trends.
Dataset Structure:
car_id: Unique identifier for each car
month: Month number (1–80)
part: Name of the automotive part
total_km: Cumulative mileage of the car
part_km_since_service: Kilometers since last service of the part
part_months_since_service: Months since last service
wear_score: Normalized wear of the part (0–1 scale)
wear_normalized: Wear score normalized across fleet
rul: Estimated remaining useful life
failed_[part]: Binary flag for part failure (multi-task)
environment, driving_style, road_type, car_segment: Contextual features
temp_severity_score, service_interval, oil_stress_mult, km_mult_event, temp_stress_mult: Simulation-derived variables
wear_lag1, part_km_since_service_lag1: Lag features for temporal modeling
Potential Uses:
Machine Learning & AI: Train predictive maintenance models, multi-task learning for component failure prediction, RUL estimation, anomaly detection.
Research & Simulations: Analyze fleet wear patterns, environmental effects, and maintenance scheduling.
Software Development: Use as a synthetic data source for automotive monitoring tools and digital twin simulations.
Education & Prototyping: Ideal for teaching predictive maintenance concepts and testing ML pipelines without real-world data constraints.
Why Choose This Dataset:
Fully synthetic and privacy-compliant—no personal or sensitive data.
High fidelity: models complex correlations between environment, driving behavior, and part wear.
Large scale: 10,000 cars × 80 months × 25 parts = over 20 million records.
Ready for multi-task ML applications with precomputed RUL and failure flags.
License:
Carithm Synthetic Car Maintenance Dataset v2.4
© 2025 Carithm
Grant of Use:
You are granted a non-exclusive, non-transferable license to use this dataset for:
Research purposes
Machine learning model development
Testing and evaluation
Commercial model training and deployment
Restrictions:
You MAY NOT:
Resell or redistribute the dataset in its original form
Claim ownership of the dataset
Use the dataset for illegal or unethical purposes
Attribution:
Please attribute the dataset when using it in publications, projects, or products:
“Dataset generated with Carithm AI Synthetic Engine v2.4 — © 2025 Carithm”
No Warranty:
This dataset is provided “as is” without warranty of any kind. Carithm is not responsible for any consequences arising from the use of this data.
Commercial License:
For exclusive rights, redistribution, or bulk use beyond typical ML model training, please contact Carithm for a separate commercial license.
By downloading or using this dataset, you agree to these terms.
File Format: CSV (carithm_synthetic_dataset_v2_4_multi_task.csv)
10,000 synthetic cars, 25 parts, 80 months of wear and failure data, with driving, road, and environment factors, multi-task ready, in a CSV for ML and analysis.