Starting with Natural Language Processing (NLP)
- BehindpeaICT
- Nov 4, 2018
- 1 min read
Updated: Mar 9
Almost all the notes come from the source: https://pythonprogramming.net/
This course consists of 21 lessons.
1. Tokenizing words and sentences with NLTK
NLTK (a library in Python): stands for Natural language toolkit
NLTK can do:
splitting sentences from paragraphs (1)
splitting up words (2)
recognizing the part of speech of those words
highlighting the main subjects
helping your machine to understand what the text is all about.
Mission: How to perform sentiment analysis (view, feeling, attitude, opinion) using NLTK.
Using tokenizing for (1) (2)
Machine learning with the Navie Bayes classifier
How to tie in Scikit-learn (sklearn) with NLTK
Training classifiers with datasets (using sklearn, etc.) ~~~ Train model step
Performing live, streaming, and sentiment analysis with a social network ~~~ Test model step.
After downloading Python and library NLTK, focusing on the vocabularies:
Corpus (singular); Copora (plural): the body of text
Lexicon: words and their meaning (each field has a different lexicon)
Token: each "entity" that is a part of whatever was split up based on rules.
Question: Is a sentence a token? Is a word a token? Is an alphabet a token, right? What do you think? and share me why?
2. Stop words with NLTK
(To be continued ...)
Comments