$120+

Car Maintenance Dataset

I want this!

Car Maintenance Dataset

$120+

Description:

This dataset is a large-scale, fully synthetic simulation of vehicle component wear and failure patterns over time. It covers 10,000 unique cars across 80 months of operation and tracks 20 critical automotive parts. Each record simulates realistic degradation influenced by driving style, road type, car segment, and environmental conditions, making it ideal for predictive maintenance research, machine learning, and fleet management simulations.


Key Features:

Car Profiles: Includes car segment (sedan, truck_suv, sports_car), driving style (conservative, normal, aggressive), and road exposure (city_mixed, highway_dominant, rough_terrain).

Environmental Context: Models the impact of hot, cold, dusty, and rainy conditions on component wear.

Parts Monitored: Engine oil, brake pads, tires, air filter, battery, timing belt, transmission fluid, AC compressor, alternator, water pump, exhaust system, coolant, fuel pump, fuel filter, oxygen sensor, spark plugs, cabin filter, wiper blades, shock absorber, control arm.

Wear Metrics: Each part includes wear_score, remaining useful life (RUL), and failure flags, with multi-output labels for machine learning tasks.


Advanced Simulation Features:

Mileage spikes and extreme environmental events for edge-case realism.

Correlation effects between parts (e.g., oil stress influencing other components).

Multi-task labels for predicting part failure independently.

Lag features for temporal modeling of wear trends.


Dataset Structure:

car_id: Unique identifier for each car

month: Month number (1–80)

part: Name of the automotive part

total_km: Cumulative mileage of the car

part_km_since_service: Kilometers since last service of the part

part_months_since_service: Months since last service

wear_score: Normalized wear of the part (0–1 scale)

wear_normalized: Wear score normalized across fleet

rul: Estimated remaining useful life

failed_[part]: Binary flag for part failure (multi-task)

environment, driving_style, road_type, car_segment: Contextual features

temp_severity_score, service_interval, oil_stress_mult, km_mult_event, temp_stress_mult: Simulation-derived variables

wear_lag1, part_km_since_service_lag1: Lag features for temporal modeling


Potential Uses:

Machine Learning & AI: Train predictive maintenance models, multi-task learning for component failure prediction, RUL estimation, anomaly detection.

Research & Simulations: Analyze fleet wear patterns, environmental effects, and maintenance scheduling.

Software Development: Use as a synthetic data source for automotive monitoring tools and digital twin simulations.

Education & Prototyping: Ideal for teaching predictive maintenance concepts and testing ML pipelines without real-world data constraints.


Why Choose This Dataset:

Fully synthetic and privacy-compliant—no personal or sensitive data.

High fidelity: models complex correlations between environment, driving behavior, and part wear.

Large scale: 10,000 cars × 80 months × 25 parts = over 20 million records.

Ready for multi-task ML applications with precomputed RUL and failure flags.


License:

Carithm Synthetic Car Maintenance Dataset v2.4

© 2025 Carithm

Grant of Use:

You are granted a non-exclusive, non-transferable license to use this dataset for:

Research purposes

Machine learning model development

Testing and evaluation

Commercial model training and deployment

Restrictions:

You MAY NOT:

Resell or redistribute the dataset in its original form

Claim ownership of the dataset

Use the dataset for illegal or unethical purposes

Attribution:

Please attribute the dataset when using it in publications, projects, or products:

“Dataset generated with Carithm AI Synthetic Engine v2.4 — © 2025 Carithm”

No Warranty:

This dataset is provided “as is” without warranty of any kind. Carithm is not responsible for any consequences arising from the use of this data.

Commercial License:

For exclusive rights, redistribution, or bulk use beyond typical ML model training, please contact Carithm for a separate commercial license.

By downloading or using this dataset, you agree to these terms.


File Format: CSV (carithm_synthetic_dataset_v2_4_multi_task.csv)

$
I want this!

10,000 synthetic cars, 25 parts, 80 months of wear and failure data, with driving, road, and environment factors, multi-task ready, in a CSV for ML and analysis.

Size
683 MB
Powered by