Tokenization and Sentence Annotation