top of page

Starting with Natural Language Processing (NLP)

Updated: Mar 9

Almost all the notes come from the source: https://pythonprogramming.net/

This course consists of 21 lessons.

1. Tokenizing words and sentences with NLTK

NLTK (a library in Python): stands for Natural language toolkit

NLTK can do:

  • splitting sentences from paragraphs (1)

  • splitting up words (2)

  • recognizing the part of speech of those words

  • highlighting the main subjects

  • helping your machine to understand what the text is all about.

Mission: How to perform sentiment analysis (view, feeling, attitude, opinion) using NLTK.

  • Using tokenizing for (1) (2)

  • Machine learning with the Navie Bayes classifier

  • How to tie in Scikit-learn (sklearn) with NLTK

  • Training classifiers with datasets (using sklearn, etc.) ~~~ Train model step

  • Performing live, streaming, and sentiment analysis with a social network ~~~ Test model step.

After downloading Python and library NLTK, focusing on the vocabularies:


  1. Corpus (singular); Copora (plural): the body of text

  2. Lexicon: words and their meaning (each field has a different lexicon)

  3. Token: each "entity" that is a part of whatever was split up based on rules.


Question: Is a sentence a token? Is a word a token? Is an alphabet a token, right? What do you think? and share me why?

2. Stop words with NLTK

(To be continued ...)

Recent Posts

See All

Comments


Warm Regards from Giang Nguyen

bottom of page