Thumb Rule when to do log transformation

When Log transformation is done, when there are no negative or missing values, and when we want to use models where non-normal distribution will throw off the model prediction / accuracy

Other reasons are :

  • Highly Skewed Distribution - means
  • Want to magnify the difference in age : Like the difference between 30 year old and 40 year old and the difference between 70 year old and 80 year old, where we want to magnify the difference of 10 years between 70 year old and 80 year
    • Beware after log the interpretation changes.
  • Need to decrease the impact of outliers

Note : If the skewed data is expected in the real world, then don't create a normal distribution column using log transformation instead use a different model like Poisson regression or other regression techniques which works better with skewed data.