Free language processing service and NLP C# code
Camel Case String To Words

NLP Task

In a loose sense, camel case refers to the practice of concatenating strings by capitalizing the first letter of each string, such as CamelCaseString, NumberOfTokens, NaturalParagraphsToHTMLParagraphs. Camel case is widely used in naming identifiers in several programming languages. Our task is to restore a camel case string into words. The preceding examples will be converted to Camel Case String, Number Of Tokens, and Natural Paragraphs To HTML Paragraphs.

Algorithms and Implementation

One algorithm is to rebuild a string like this:


    (1) Create an empty string s

    (2) Check every character c of the camel case string:

        append c and a space to s,
           if it is in lower case and its following character is in upper case, e.g.
              HtmlCode -> Html Code
           or if it is in upper case and it is followed by another upper case letter and a lower case letter. e.g.
              HTMLCode -> HTML Code

        otherwise, append c only.
However, we want to use the lookahead and the lookbehind of regular expression to solve this problem. Simply replace space for each of those positions which have the following features: Case 1: when we look behind from them we find a lower case letter and when we look ahead from them we find an upper case letter. Case 2: when we look behind we find an upper case letter and when we look ahead we find an upper case letter and a lower case letter. The regex to match those positions is this:

              case 1     |          case 2

    ((?<=[a-z])(?=[A-Z]))|((?<=[A-Z])(?=[A-Z][a-z]))


    (1) The '|' specifies alternative possibilities
    (2) [a-z] means any letter in lower case;
    (3) [A-Z] means any letter in upper case;
    (4) (?<=) is a positive lookbehind marker
    (5) (?=) is a positive lookahead marker.    

Put together, this regular expression matches a position (rather than any text) that is either preceded by a letter in lower case and followed by a letter in upper case or preceded by a letter in upper case and followed by a letter in upper case and another letter in lower case. To reach our goal, we only need to replace those positions with spaces.

Code (Download)


using System;
using System.Text.RegularExpressions;

namespace CamelCaseStringToWords
{
    class Program
    {
        static void Main( string[] args )
        {
            string input = string.Empty;

            Console.WriteLine( "\r\n**** Convert Camel Case String to Words ****\r\n\r\n" );

             // Loop forever until user types 'q' or "Q". 
            while ( true )
            {
                Console.Write( "Type a word to split or Q or q to quit: " );
                input = Console.ReadLine( );

                if ( input.ToLower( ) == "q" )
                {
                    break;
                }

                 // We use the static Replace method of the Regex class. It has several overloads. 
                 // For this particular overload, it accepts three parameters: 
                 // (1) The string to match against and replace for (the came case string, in our case) 
                 // (2) The regular expression 
                 // (3) The string to replace with (a space, in our case) 
                string newStr = Regex.Replace( input, "((?<=[a-z])(?=[A-Z]))|((?<=[A-Z])(?=[A-Z][a-z]))" , " " );

                Console.WriteLine( );
                Console.WriteLine( newStr );
                Console.WriteLine( );
            }
        }
    }
}

Figures

Camel Case String To Words
Fig. Camel Case String To Words

Note

(1) This code is useful if we want to display some variable names in a user-friendly way. For example, in the NLP Sample Code section of this website, I used this code to display the image names and set the alternative text of the image links programmatically.

(2) If we use the instance method of the Regex class rather than the static method, the related code would be:


       Regex regex = new Regex("((?<=[a-z])(?=[A-Z]))|((?<=[A-Z])(?=[A-Z][a-z]))");
       string newStr = regex.Replace(input, " ");

Skip Navigation LinksHome > Nlp Sample Code > Regular Expressions > Camel Case String To Words