Posts

Showing posts with the label Data Science

Classification metrics and their Use Cases

Image
  In this blog, we will discuss about commonly used classification metrics. We will be covering  Accuracy Score ,  Confusion Matrix ,  Precision ,  Recall ,  F-Score ,  ROC-AUC  and will then learn how to extend them to the  multi-class classification . We will also discuss in which scenarios, which metric will be most suitable to use. First let’s understand some important terms used throughout the blog- True Positive (TP):  When you predict an observation belongs to a class and it actually does belong to that class. True Negative (TN):  When you predict an observation does not belong to a class and it actually does not belong to that class. False Positive (FP) : When you predict an observation belongs to a class and it actually does not belong to that class. False Negative(FN):  When you predict an observation does not belong to a class and it actually does belong to that class. All classification metrics work on these four te...

Data Science Performance Metrics for Everyone

Image
  Data Science Performance Metrics for Everyone Accuracy, recall, precision, sensitivity, specificity, … — data scientists use so many performance metrics! How do you explain all of them to audiences with non-technical backgrounds? As a data scientist, I find it both challenging, fun, and critical to my job to describe these concepts to everyone. This blog post will explain many performance metrics using common language and pictures so everyone at your company can understand them. Recently, I developed a machine learning model to predict which patients on dialysis will be admitted to the hospital in the next week. This model has received lots of attention in my company (Fresenius Medical Care North America), so I have presented the details of this model to a wide range of audiences including data scientists, data analysts, nurses, physicians, and even the C-suite. From experience, I have learned that everyone interprets ‘accuracy’ differently, so I have to be very careful to explai...

Binning for Feature Engineering in Machine Learning

Image
  Binning for Feature Engineering in Machine Learning Using binning as a technique to quickly and easily create new features for use in machine learning. Photo by Tim Mossholder on Unsplash If you have trained your model and still think the accuracy can be improved, it may be time for feature engineering. Feature engineering is the practice of using existing data to create new features. This post will focus on a feature engineering technique called “binning”. This post will assume a basic understanding of Python, Pandas, NumPy, and matplotlib. Most of the time links are provided for a deeper understanding of what is being used. If something doesn’t make sense, please leave a comment and I will try my best to elaborate. What is Binning? Binning is a technique that accomplishes exactly what it sounds like. It will take a column with continuous numbers and place the numbers in “bins” based on ranges that we determine. This will give us a new categorical variable feature...

25 Questions to Test Your Skills on Decision Trees

Image
  25 Questions to Test Your Skills on Decision Trees   1. What is the Decision Tree Algorithm? A Decision Tree is a supervised machine learning algorithm that can be used for both Regression and Classification problem statements. It divides the complete dataset into smaller subsets while at the same time an associated Decision Tree is incrementally developed. The final output of the Decision Trees is a Tree having Decision nodes and leaf nodes. A Decision Tree can operate on both categorical and numerical data. Image Source: Google Images   2. List down some popular algorithms used for deriving Decision Trees along with their attribute selection measures. Some of the popular algorithms used for constructing decision trees are: 1.   ID3 (Iterative Dichotomiser):  Uses Information Gain as attribute selection measure. 2.   C4.5 (Successor of ID3):   Uses   Gain Ratio as attribute selection measure. 3. CART (Classification and Re...