Free language processing service and NLP C# code
Free English Tokenizing Service Efficient and Reliable Tokenizer

Upload your file

  1. Provide your text by typing, or copying & pasting, or file uploading.
  2. Check your options.
  3. Click 'Tokenize'.

In computational linguistics, tokenizing refers to the process of splitting a piece of text into a list of tokens. A token can be a word, a number, a symbol, or a punctuation mark. For example, a sentence like a) should be tokenized into tokens b):
  a) Dr. Covington lent me $50.
  b) 'Dr.', 'Covington', 'lent', 'me', '$', '50', '.'

      Tokenizing cannot be naively done by using white spaces and punctuation marks to split texts. On the contrary, the main challenge lies in the proper treatment of symbols, digits, and punctuation marks, as shown by the above example.

       Though it seems trivial, tokenizing is so important that it is indispensable to almost all advanced natural language processing activities.

Skip Navigation LinksHome > Free NLP Online Services > Tokenizing