NLP (colab notebook)

notebook about NLP and frequency-based vectorization methods (bag of words + TF/IDF)
2025-12-14 12:40
// updated 2025-12-21 13:45

Proceedings from a (2025-12-14) lecture about natural language processing (NLP):

(accessible via link since Google Colab does not allow posting notebooks in an iframe)

Topics covered

  • NLP
    • Tokenization
    • Stop words removal
    • n-grams
    • Vectorization
      • Count-based models
        • Bag of words
        • TF/IDF

Further takeaways

  • the earlier post about NLP should expand upon the topics covered above!
  • the next post about NLP pre-processing steps also contains more Colab Notebook type examples
⬅️ older (in snippets)
📙 Machine learning (colab notebook 3)
newer (in snippets) ➡️
Pre-processing steps for NLP 📜
newer (posts) ➡️
Pre-processing steps for NLP 📜