TGSPOWERBI

Posts

Showing posts with the label Data Science

Classification metrics and their Use Cases

November 27, 2022

In this blog, we will discuss about commonly used classification metrics. We will be covering Accuracy Score , Confusion Matrix , Precision , Recall , F-Score , ROC-AUC and will then learn how to extend them to the multi-class classification . We will also discuss in which scenarios, which metric will be most suitable to use. First let’s understand some important terms used throughout the blog- True Positive (TP): When you predict an observation belongs to a class and it actually does belong to that class. True Negative (TN): When you predict an observation does not belong to a class and it actually does not belong to that class. False Positive (FP) : When you predict an observation belongs to a class and it actually does not belong to that class. False Negative(FN): When you predict an observation does not belong to a class and it actually does belong to that class. All classification metrics work on these four te...

Data Science Performance Metrics for Everyone

October 10, 2022

Data Science Performance Metrics for Everyone Accuracy, recall, precision, sensitivity, specificity, … — data scientists use so many performance metrics! How do you explain all of them to audiences with non-technical backgrounds? As a data scientist, I find it both challenging, fun, and critical to my job to describe these concepts to everyone. This blog post will explain many performance metrics using common language and pictures so everyone at your company can understand them. Recently, I developed a machine learning model to predict which patients on dialysis will be admitted to the hospital in the next week. This model has received lots of attention in my company (Fresenius Medical Care North America), so I have presented the details of this model to a wide range of audiences including data scientists, data analysts, nurses, physicians, and even the C-suite. From experience, I have learned that everyone interprets ‘accuracy’ differently, so I have to be very careful to explai...

Binning for Feature Engineering in Machine Learning

September 17, 2022

Binning for Feature Engineering in Machine Learning Using binning as a technique to quickly and easily create new features for use in machine learning. Photo by Tim Mossholder on Unsplash If you have trained your model and still think the accuracy can be improved, it may be time for feature engineering. Feature engineering is the practice of using existing data to create new features. This post will focus on a feature engineering technique called “binning”. This post will assume a basic understanding of Python, Pandas, NumPy, and matplotlib. Most of the time links are provided for a deeper understanding of what is being used. If something doesn’t make sense, please leave a comment and I will try my best to elaborate. What is Binning? Binning is a technique that accomplishes exactly what it sounds like. It will take a column with continuous numbers and place the numbers in “bins” based on ranges that we determine. This will give us a new categorical variable feature...

25 Questions to Test Your Skills on Decision Trees

April 16, 2022

25 Questions to Test Your Skills on Decision Trees 1. What is the Decision Tree Algorithm? A Decision Tree is a supervised machine learning algorithm that can be used for both Regression and Classification problem statements. It divides the complete dataset into smaller subsets while at the same time an associated Decision Tree is incrementally developed. The final output of the Decision Trees is a Tree having Decision nodes and leaf nodes. A Decision Tree can operate on both categorical and numerical data. Image Source: Google Images 2. List down some popular algorithms used for deriving Decision Trees along with their attribute selection measures. Some of the popular algorithms used for constructing decision trees are: 1. ID3 (Iterative Dichotomiser): Uses Information Gain as attribute selection measure. 2. C4.5 (Successor of ID3): Uses Gain Ratio as attribute selection measure. 3. CART (Classification and Re...