About   cv   Etc   Now   Zettelkästen  

Stemming and lemmatizing with sklearn vectorizers

One of the most basic techniques in Natural Language Processing (NLP) is the creation of feature vectors based on word counts.

scikit-learn provides efficient classes for this:

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

If we want to build feature vectors over a vocabulary of stemmed or lemmatized words, how can …

Read more


See archives for more ...

An IndieWeb Webring 🕸💍