Best stroke prediction dataset github o Replacing the outlier values with the mode. Skip to content. app. Perform Extensive Exploratory Data Analysis, apply three clustering algorithms & apply 3 classification algorithms on the given stroke prediction dataset and mention the best findings. We analyze a stroke dataset and formulate various statistical models for predicting whether a patient has had a stroke based on measurable predictors. Input Features: id: A unique identifier for each patient in the dataset. 50 1176 1 0. Input data is preprocessed and is given to over 7 models, where a maximum accuracy of 99. machine-learning random-forest svm jupyter-notebook logistic-regression lda knn baysian stroke-prediction If not available on GitHub, the notebook can be accessed on nbviewer, or alternatively on Kaggle. Analysis of the Stroke Prediction Dataset provided on Kaggle. Kaggle is an AirBnB for Data Scientists. - victorjongsoon/stroke-prediction Jun 13, 2021 · Download the Stroke Prediction Dataset from Kaggle and extract the file healthcare-dataset-stroke-data. With a relatively smaller dataset (although quite big in terms of a healthcare facility), every possible effort to minimize or eliminate overfitting was made, ranging from methods like k-fold cross validation to hyperparameter optimization (using grid search CV) to find the best value for each parameters in a model. 3). Manage code changes You signed in with another tab or window. The high mortality and long-term care requirements impose a significant burden on healthcare systems and families. The project aims at displaying the charts/plots of the number of people affected by stroke based on the input parameters like smoking status, high blood pressure level, Cholesterol level, obesity level in some of the countries. Machine learning models were evaluated with Pandas in Jupyter notebooks using a stroke prediction dataset. Timely prediction and prevention are key to reducing its burden. csv │ └── raw/ │ └── healthcare-dataset You signed in with another tab or window. Using SQL and Power BI, it aims to identify trends and corr Stroke Prediction Dataset Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. o scale values of avg_glucose_level, bmi, and age by using StandardScaler in sklearn. The chosen model was connected to an interactive Tableau dashboard that predicts a user's stroke risk using a Tabpy server. Dataset. The API can be integrated seamlessly into existing healthcare systems Using the “Stroke Prediction Dataset” available on Kaggle, our primary goal for this project is to delve deeper into the risk factors associated with stroke. o use SMOTE from <class 'pandas. - baisali14/Hypertension-Heart-Disease-and-Stroke-Prediction-using-SVM This repository holds a machine learning model trained using SVM to predict whether a person has hypertension or not, the person has heart disease or not and the person has stroke Navigation Menu Toggle navigation. Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. Sign in #Hypothesis: people who had stroke is higher in bmi than people who had no stroke. Achieved high recall for stroke cases. 51 1228 Contribute to arturnovais/Stroke-Prediction-Dataset development by creating an account on GitHub. PREDICTION-STROKE/ ├── data/ │ ├── models/ │ │ ├── best_stroke_model. We get the conclusion that age, hypertension and work type self-employed would affect the possibility of getting stroke. using visualization libraries, ploted various plots like pie chart, count plot, curves Toggle navigation. 52 52 avg / total 0. The stroke prediction dataset was used to perform the study. model. - bpalia/StrokePrediction. Find and fix vulnerabilities Stroke Prediction Dataset. 7) This project predicts stroke disease using three ML algorithms - fmspecial/Stroke_Prediction Machine Learning project using Kaggle Stroke Dataset where I perform exploratory data analysis, data preprocessing, classification model training (Logistic Regression, Random Forest, SVM, XGBoost, KNN), hyperparameter tuning, stroke prediction, and model evaluation. joblib │ │ ├── model_metadata. Dataset: Stroke Prediction Dataset This project predicts stroke occurrences using machine learning on a healthcare dataset. 67 0. In this project, I use the Heart Stroke Prediction dataset from WHO to predict the heart stroke. - ajspurr/stroke_prediction Skip to content. Predicting whether a patient is likely to get stroke or not - terickk/stroke-prediction-dataset Skip to content. Data Preprocessing: This includes handling missing values, encoding categorical variables, dealing with outliers, and normalizing the data to prepare it for modeling. The model is trained on a dataset with various health-related features to predict the likelihood of a stroke occurrence. - cayelsie/Stroke-prediction Contribute to Aftabbs/Stroke-Prediction-using-Machine-Learning development by creating an account on GitHub. gender: The gender of the patient, which can be "Male" or "Female". Incorporate more data: To improve our dataset in the next iterations, we need to include more data points of people with stroke so that we can create target balance before modeling Sep 15, 2022 · Authors Visualization 3. joblib │ ├── processed/ │ │ ├── processed_stroke_data. ipynb, selects a model across many different classifiers and tunes the best selected classifiers using cross-validation. 95 0. The dataset includes 100k patient records. I perform EDA using Pandas, seaborn, matplotlib library In this I used machine learning algorithms for categorical output like, logistic regression, Decision tree, Random forest, KNN, Adaboost, gradientboost, xgboost with and without hyperpameter tunning I concluded, the This prediction model has been brought up for the purpose of predicting stroke cases in patients due to the increase in overall cases across the world. Synthetically generated dataset containing Stroke Prediction metrics. Analysis of the Stroke Prediction Dataset to provide insights for the hospital. html is pressed) and converts it into an array. As said above, there are 12 features with one target feature or response variable -stroke- and 11 explanatory variables. You signed in with another tab or window. This project uses six machine learning models (XGBoost, Random Forest Classifier, Support Vector Machine, Logistic Regression, Single Decision Tree Classifier, and TabNet)to make stroke predictions. Contribute to CTrouton/Stroke-Prediction-Dataset development by creating an account on GitHub. Show Gist options. Alleviate healthcare costs associated with long-term stroke care. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects. - EDA-Clustering-Classification-on-Stroke-Prediction-Dataset/README. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. Libraries Used: Pandas, Scitkitlearn, Keras, Tensorflow, MatPlotLib, Seaborn, and NumPy DataSet Description: The Kaggle stroke prediction dataset contains over 5 thousand samples with 11 total features (3 continuous) including age, BMI, average glucose level, and more. The output attribute is a The dataset used in the development of the method was the open-access Stroke Prediction dataset. 79 0. 15,000 records & 22 fields of stroke prediction dataset, containing: 'Patient ID', 'Patient Name', 'Age', 'Gender', 'Hypertension', 'Heart Disease', 'Marital Status', 'Work Type The aim of this project is to determine the best model for the prediction of brain stroke for the dataset given, to enable early intervention and preventive measures to reduce the incidence and impact of strokes, improving patient outcomes and overall healthcare. Reload to refresh your session. For this purpose, I used the "healthcare-dataset-stroke-data" from Kaggle. The dataset used to build our model is Stroke Prediction Dataset which is available in Kaggle. Analysis based 4 different machine learning models. Sep 21, 2021 · <class 'pandas. ; The system uses a 70-30 training-testing split. The value of the output column stroke is either 1 or 0. Prediction of brain stroke based on imbalanced dataset in Perform Extensive Exploratory Data Analysis, apply three clustering algorithms & apply 3 classification algorithms on the given stroke prediction dataset and mention the best findings. - NIRMAL1508/STROKE-DISEASE-PREDICTION In this project, we used logistic regression to discover the relationship between stroke and other input features. Stroke Disease Prediction classifies a person with Stroke Disease and a healthy person based on the input dataset. - mmaghanem/ML_Stroke_Prediction Hi all,. We aim to identify the factors that con Prediction of stroke in patients using machine learning algorithms. Manage code changes Write better code with AI Security. This dataset has been used to predict stroke with 566 different model algorithms. This dataset was created by fedesoriano and it was last updated 9 months ago. Write better code with AI Security. In this project, we will attempt to classify stroke patients using a dataset provided on Kaggle: Kaggle Stroke Dataset. The objective is to predict brain stroke from patient's records such as age, bmi score, heart problem, hypertension and smoking practice. Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for The dataset used to predict stroke is a dataset from Kaggle. These features are selected based on our earlier discussions. Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. There were 5110 rows and 12 columns in this dataset. Data exploration, preprocessing, analysis and building a stroke model prediction in the life of the patient. GitHub repository for stroke prediction project. com Hi all,. com This dataset is imbalenced . Write better code with AI Code review Performing Various Classification Algorithms with GridSearchCV to find the tuned parameters - STROKE_PREDICTION_DATASET/Stroke_Prediction_Dataset. o Convert categorical variables to numbers by LabelEncoder in sklearn. This project utilizes ML models to predict stroke occurrence based on patient demographic, medical, and lifestyle data. Navigation Menu Toggle navigation The dataset for the project has the following columns: id: unique identifier; gender: "Male", "Female" or "Other" age: age of the patient; hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension With the help of kaggle stroke prediction dataset, identify patients with a stroke. 4) Which type of ML model is it and what has been the approach to build it? This is a classification type of ML model. Column Name Data Type Description; id: Integer: Unique identifier: gender: Object "Male", "Female", "Other" age: Float: Age of patient: hypertension: Integer: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension 11 clinical features for predicting stroke events Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Contribute to arturnovais/Stroke-Prediction-Dataset development by creating an account on GitHub. isnull(). predict() method takes input from the request (once the 'compute' button from index. - hridaybasa/Stroke-Prediction-Using-Data-Science-And-Machine-Learning Project Title: "Cerebral-Stroke-Prediction" for predicting whether a patient will suffer from a stroke, in order to provide timely interventions. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. - ankitlehra/Stroke-Prediction-Dataset---Exploratory-Data-Analysis Nov 1, 2022 · Here we present results for stroke prediction when all the features are used and when only 4 features (A, H D, A G and H T) are used. 5% of them are related to stroke patients and the remaining 98. core. A dataset containing all the required fields to build robust AI/ML models to detect Stroke. Each row in the data provides relavant information about the patient. Navigation Menu Toggle navigation Model comparison techniques are employed to determine the best-performing model for stroke prediction. md at main · KSwaviman/EDA-Clustering-Classification-on-Stroke-Prediction-Dataset Predicting whether a patient is likely to get stroke or not - stroke-prediction-dataset/README. This project builds a classifier for stroke prediction, which predicts the probability of a person having a stroke along with the key factors which play a major role in causing a stroke. Among the records, 1. Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. Contribute to Rasha-A21/Stroke-Prediction-Dataset development by creating an account on GitHub. Sign in Product. 52%) and high FP rate (26. Sign in Product The Dataset Stroke Prediction is taken in Kaggle. ipynb - 4. md at main · terickk/stroke-prediction-dataset Stroke is a leading cause of death and disability worldwide. Brain stroke poses a critical challenge to global healthcare systems due to its high prevalence and significant socioeconomic impact. 5% of them are related to non-stroke patients. We have also done Hyperparameter tuning for each model. The following approach is used: Contribute to enot9910/Stroke-Prediction-Dataset development by creating an account on GitHub. Stroke Prediction Dataset. Data Set Information: This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. This model is created with the following data in mind: patient data which includes medical history and demographic information. 77 0. There are more female than male in the data set. It’s a crowd- sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science, machine learning and predictive analytics problems. age: The age In our project we want to predict stroke using machine learning classification algorithms, evaluate and compare their results. Deployment and API: The stroke prediction model is deployed as an easy-to-use API, allowing users to input relevant health data and obtain real-time stroke risk predictions. csv. #Create two table: stroke people, normal people #At 99% CI, the stroke people bmi is higher than normal people bmi at 0. Data Source: The healthcare-dataset-stroke-data. In the code, we have created the instance of the Flask() and loaded the model. Mar 22, 2023 · GitHub Gist: instantly share code, notes, and snippets. 3 Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. Stroke Prediction Using Machine Learning (Classification use case) Topics machine-learning model logistic-regression decision-tree-classifier random-forest-classifier knn-classifier stroke-prediction Contribute to enot9910/Stroke-Prediction-Dataset development by creating an account on GitHub. frame. 71 0. Using SQL and Power BI, it aims to identify trends and corr Write better code with AI Code review. and choosign the best one (for this case): the Contribute to HemantKumarRathore/STROKE-PREDICTION-using-multiple-ML-algorithem-and-comparing-best-accuracy-based-on-given-dataset development by creating an account Hi all, This is the capstone project on stroke prediction dataset. - NVM2209/Cerebral-Stroke-Prediction Performing Various Classification Algorithms with GridSearchCV to find the tuned parameters - Akshay672/STROKE_PREDICTION_DATASET Contribute to KhaledFadi/Stroke-Prediction development by creating an account on GitHub. We used as a dataset the "Stroke Prediction Dataset" from Kaggle. Find and fix vulnerabilities You signed in with another tab or window. Leveraged skills in data preprocessing, balancing with SMOTE, and hyperparameter optimization using KNN and Optuna for model tuning. Navigation Menu Toggle navigation. You signed out in another tab or window. Feature Engineering; o Substituting the missing values with the mean. The dataset have: 4 numerical variables: "id", "age", "avg_glucose_leve" and "bmi" Stroke Prediction Dataset. GitHub Copilot. joblib │ │ └── optimized_stroke_model. Saved searches Use saved searches to filter your results more quickly Comparing 10 different ML classifiers and using the one having best accuracy to predict the stroke risk to user. Using SQL and Power BI, it aims to identify trends and corr This code demonstrates the development of a stroke prediction model using machine learning and the deployment of the model as a FastAPI web service. Stroke are becoming more common among female than male; A person’s type of residence has no bearing on whether or not they have a stroke. 16 0. We did the following tasks: Performance Comparison using Machine Learning Classification Algorithms on a Stroke Prediction dataset. 98 0. Dependencies Python (v3. Divide the data randomly in training and testing 3) What does the dataset contain? This dataset contains 5110 entries and 12 attributes related to brain health. 82 bmi #Conclusion: Reject the null hypothesis, finding that higher bmi level is likely The object is to use the best machine learning model and come back to study the correct predictions, and find out more precious characters on stroke patients. By developing a predictive model, we aim to: Reduce the incidence of stroke through early intervention. We tune parameters with Stratified K-Fold Cross Validation, ROC-AUC, Precision-Recall Curves and feature importance analysis. Fetching user details through web app hosted using Heroku. - GitHub - Assasi Stroke prediction with machine learning and SHAP algorithm using Kaggle dataset - Silvano315/Stroke_Prediction. sum() OUTPUT: id 0 gender 0 age 0 hypertension 0 heart_disease 0 ever_married 0 work_type 0 Residence Aug 25, 2022 · More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. 2. Later tuned model by selecting variables with high coefficient > 0. ; The system uses Logistic Regression: Logistic Regression is a regression model in which the response variable (dependent variable) has categorical values such as True/False or 0/1. The dataset consists of over $5000$ individuals and $10$ different input variables that we will use to predict the risk of stroke. heroku scikit-learn prediction stroke-prediction Brain Stroke Prediction- Project on predicting brain stroke on an imbalanced dataset with various ML Algorithms and DL to find the optimal model and use for medical applications. DataFrame'> Int64Index: 4908 entries, 0 to 5109 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ----- ----- ----- 0 id 4908 non-null int64 1 gender 4908 non-null object 2 age 4908 non-null float64 3 hypertension 4908 non-null int64 4 heart_disease 4908 non-null int64 5 ever_married 4908 non-null object 6 work_type 4908 non-null object 7 Residence Project Introduction: My project is titled "Cerebral-Stroke-Prediction", with the goal of predicting whether a patient will suffer from a stroke so that timely interventions can be provided. py has the main function and contains all the required functions for the flask app. You switched accounts on another tab or window. Marital status and presence of heart disease have no significant effect on stroke; Older age, hypertension, higher glucose level and higher BMI increase the risk of stroke At the conclusion of segment 1 of this project we have tried several different machine learning models with this dataset (RandomForestClassifier, BalancedRandomForestClassifier, LogisticRegression, and Neural Network). - GitHub - sa-diq/Stroke-Prediction: Prediction of stroke in patients using machine learning algorithms. Contact Info Please direct all communications to Henry Tsai @ hawkeyedatatsai@gmail. 66 0. In the Heart Stroke dataset, two class is totally imbalanced and heart stroke datapoints will be easy to ignore to compare with the no heart stroke datapoints. Contribute to kushal3877/Stroke-Prediction-Dataset development by creating an account on GitHub. Using SQL and Power BI, it aims to identify trends and corr Handling Class Imbalance: Since stroke cases are rare in the dataset (class imbalance), we applied SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples of the minority class and balance the dataset. Recall is very useful when you have to Sep 18, 2024 · You signed in with another tab or window. project aims to predict the likelihood of a stroke based on various health parameters using machine learning models. to make predictions of stroke cases based on simple health Plan and track work Code Review. Selected features using SelectKBest and F_Classif. Sign in Product Contribute to 9amomaru/Stroke-Prediction-Dataset development by creating an account on GitHub. The goal here is to get the best accuracy on a larger dataset. Check for Missing values # lets check for null values df. 47 - 2. This notebook, 2-model. [5] 2. I have done EDA, visualisation, encoding, scaling and modelling of dataset. This dataset has: 5110 samples or rows; 11 features or columns; 1 target column (stroke). The dataset consists of 11 clinical features which contribute to stroke occurence. The input variables are both numerical and categorical and will be explained below. - msn2106/Stroke-Prediction-Using-Machine-Learning Feb 7, 2024 · Their objectives encompassed the creation of ML prediction models for stroke disease, tackling the challenge of severe class imbalance presented by stroke patients while simultaneously delving into the model’s decision-making process but achieving low accuracy (73. Learn more Mar 7, 2025 · Dataset Source: Healthcare Dataset Stroke Data from Kaggle. Find and fix vulnerabilities The "Cerebral Stroke Prediction" dataset is a real-world dataset used for the task of predicting the occurrence of cerebral strokes in individual. Part I (see Stroke prediction using Logistic regression. - GitHub - erma0x/stroke-prediction-model: Data exploration, preprocessing, analysis and building a stroke model prediction in the life of the patient. Initially an EDA has been done to understand the features and later This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. 7162480376766092 Predicted No Stroke Predicted Stroke Actual No Stroke 780 396 Actual Stroke 12 40 pre rec spe f1 geo iba sup 0 0. A subset of the original train data is taken using the filtering method for Machine Learning and Data Visualization purposes. It is used to predict whether a patient is likely to get stroke based on the input parameters like age, various diseases, bmi, average glucose level and smoking status. Factors such as age, body mass index, smoking status, average glucose level, hypertension, heart disease, and body mass index are critical risk factors for stroke. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others. Navigation Menu Toggle navigation Predicted stroke risk with 92% accuracy by applying logistic regression, random forests, and deep learning on health data. ipynb at main Contribute to manop-ph/stroke-prediction-dataset development by creating an account on GitHub. 57%) using Logistic Regression on kaggle dataset . 76 0. 09 0. The number 0 indicates that no stroke risk was identified, while the value 1 indicates that a stroke risk was detected. To determine which model is the best to make stroke predictions, I plotte… Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. - KSwaviman/EDA-Clustering-Classification-on-Stroke-Prediction-Dataset This project describes step-by-step procedure for building a machine learning (ML) model for stroke prediction and for analysing which features are most useful for the prediction. . It includes data preprocessing (label encoding, KNN imputation, SMOTE for balancing), and trains models like Naive Bayes, Decision Tree, SVM, and Logistic Regression. The input data is sourced from Kaggle, and this dataset is severely imbalanced, so we need to apply techniques like UnderSampling to balance the data. 2. Contribute to renjinirv/Stroke-prediction-dataset development by creating an account on GitHub. - JuanS286/StrokeClassifier This project looks to create a stroke classifier to predict the likelihood of a patient to have a stroke. In addition to the features, we also show results for stroke prediction when principal components are used as the input. ipynb at main · terickk/stroke-prediction-dataset I have taken this dataset from kaggle. Working with dataset consisting of lifestyle and physical data in order to build model for predicting strokes - R-C-McDermott/Stroke-prediction-dataset The system uses data pre-processing to handle character values as well as null values. Navigation Menu Toggle navigation Easy Ensemble AdaBoost Classifier Balanced Accuracy Score: 0. csv from the Kaggle Website, credit to the author of the dataset fedesoriano. csv │ │ ├── stroke_data_engineered. To predict what factors influence a person’s stroke, I will utilize the stroke variable as the dependent variable. DataFrame'> Int64Index: 4088 entries, 25283 to 31836 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ----- ----- ----- 0 gender 4088 non-null object 1 age 4088 non-null float64 2 hypertension 4088 non-null int64 3 heart_disease 4088 non-null int64 4 ever_married 4088 non-null object 5 work_type 4088 non-null object 6 Residence_type 4088 non-null In this stroke prediction model we have implemented Logistic Regression, Random Forest & LightGBM. Optimized dataset, applied feature engineering, and implemented various algorithms. This study uses the "healthcare-dataset-stroke-data" from Kaggle, which includes 5110 observations and 12 attributes, to predict stroke occurrence. Take it to the Real World: We need to use our model to make predictions using unseen data to see how it performs. o Visualize the relation between stroke and other features by use pandas crosstab and seaborn heatmap. This package can be imported into any application for adding security features. 98% accurate - This stroke risk prediction Machine Learning model utilises ensemble machine learning (Random Forest, Gradient Boosting, XBoost) combined via voting classifier. So i used sampling technique to solve that problem. You need to download ‘Stroke Prediction Dataset’ data using the library Scikit learn; ref is given below. Summary without Implementation Details# This dataset contains a total of 5110 datapoints, each of them describing a patient, whether they have had a stroke or not, as well as 10 other variables, ranging from gender, age and type of work Stroke Prediction Analysis Project: This project explores a dataset on stroke occurrences, focusing on factors like age, BMI, and gender. I used Logistic Regression with manual class weights since the dataset is imbalanced. Predicting whether a patient is likely to get stroke or not - stroke-prediction-dataset/code. Navigation Menu Toggle navigation Contribute to 9amomaru/Stroke-Prediction-Dataset development by creating an account on GitHub. The goal is to, with the help of several easily measuable predictors such as smoking , hyptertension , age , to predict whether a person will suffer from a stroke. heroku scikit-learn prediction stroke-prediction Stroke Prediction for Preventive Intervention: Developed a machine learning model to predict strokes using demographic and health data. BhanuMotupalli / Heart Stroke Prediction Dataset. The dataset is preprocessed, analyzed, and multiple models are trained to achieve the best prediction accuracy. Created March 22, 2023 21:03. 4% is achieved. Tools: Jupyter Notebook, Visual Studio Code, Python, Pandas, Numpy, Seaborn, MatPlotLib, Supervised Machine Learning Binary Classification Model, PostgreSQL, and Tableau. Using SQL and Power BI, it aims to identify trends and corr An exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. csv │ │ └── stroke_data_final. zpojqh cgfglih waxseg tbgl tajwp qbnksm lcsicr fkpdptzw gqfmn bgsb ksaw tckqxb rgidt ffdofah hiag