Stemming and lemmatizing with sklearn vectorizers

One of the most basic techniques in Natural Language Processing (NLP) is the creation of feature vectors based on word counts. scikit-learn provides efficient classes for this: from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer If we want to build feature vectors over a vocabulary of stemmed or lemmatized words, how can we …

Read more

Applying operations on grouped dataframes in Pandas

Zip is its own inverse

Flatten nested lists with a list comprehension

Wrong feature preprocessing is a source of train-test leakage

Masking with Boolean arrays in Numpy

Site update: Breadcrumbs, taxonomies, paginators

Secondary sorting in Python

Hugo template snippets of new website features

Vacancy Recommender Hackaton with Spark

Getting a grip on programmer jargon (by Joran Welling)

Selecting user commands in style (Python)

Object Orientation: Observer Pattern

Object Orientation: Strategy Pattern

Calculating pi with Monte Carlo simulation


See archives for more ...

Latest comments

Q commented on /digest-2021-04 on Apr 30, 2021
Edwin replied to /42-Vim_Notetaking on Apr 22, 2021
Ritchie commented on /42-Vim_Notetaking on Apr 22, 2021
Edwin replied to /42-Vim_Notetaking on Apr 22, 2021
captain shambles commented on /42-Vim_Notetaking on Apr 22, 2021

Tags

about agre ai annotations arch automation autonomous vehicle bayes beautifulsoup bibliography bibtex big data blog book review boolean brid.gy cli coetzee comments community complexity courses ctags cyber death deepfake democracy derrida design pattern dictionary digest digital art digitalization diy docker efficiency email emergence encryption epub etc ethics example friendship function creep gaussian github google gpg hack hackaton healthcare heidegger hermeneutics holism hugo identity implication indieauth indieweb induction inference intimacy ir kobo lambda latex linux literature logic machine learning machine morality map markdown math michel de montaigne microformats2 mle monte carlo morton music neomutt netlify nginx nlp note-taking now numpy object orientation observer ontology pagerank pandas pandoc paradox partner perceptron pgp phenomenology philosophy portrait privacy probability programming python ransomware raspberry pi recommender reve rice ricoeur russell security simulacrum sklearn south-africa spark ssh staticman strategy surveillance technocracy templating terrorism text editing theme theology tilde tmux vim war web scraping webmention.io website workflow writing zettelkasten
An IndieWeb Webring 🕸💍