Free language processing service and NLP C# code
Remove Non-alphanumeric Characters

NLP Task

Given a string, remove the first and last characters if they are not letters or numbers.

Algorithms and Implementation

The char class has about a dozen of static methods to check the feature of the incoming character, such as whether it is a letter, or number, or punctuation, etc. We can specify any character of a string by specifying the character’s index in the string. For example, holiday[1] is the character "o". The string has a Substring instance method for us to get part of the incoming string. See the following sample code of a Console application to get some idea. In your program, put the cursor at the method you want to find more information and press F1, the .NET Framework SDK documentation will pop up the help information about that method.

Our algorithm is if the string starts and ends with a non-alphanumeric character, we take the part without the first or the last character; if it starts with but not ends with a non-alphanumeric character, we take the substring starting from index 1; if it ends with but not starts with a non-alphanumeric character, we take all the characters except the last one. For all other cases, we just return the original string.

Code (Download)


using System;

namespace PurgeString
{
    class Program
    {
        static void Main( string[] args )
        {
            string input = string.Empty;

             // Loop forever until user types 'q' or "Q". 
            while( true )
            {
                Console.Write( "Type a test string or Q to quit: " );
                input = Console.ReadLine( );
                
                if ( input.ToLower( ) == "q" )
                {
                    break;
                }

                string output = RemovePuncts( input );
                Console.WriteLine( output + "\r\n" );
            }
        }

        /// <summary>
        ///  Strip off the starting non-alphanumeric character and the ending non-alphanumeric 
         //  character from a string. 
        /// </summary>
        /// <param name="str"> the string to process</param>
        /// <returns> the string with the starting and ending non-alphanumeric characters stripped</returns>
        private static string RemovePuncts( string str )
        {
            string newStr = str;
            if ( newStr != string.Empty )
            {
                int strLength = newStr.Length;

                bool punctHead = !char.IsLetterOrDigit( newStr [0] );
                bool punctEnd = !char.IsLetterOrDigit( newStr [strLength - 1] );

                 //if at least one end is a non-alphanumeric character... 
                if ( punctHead || punctEnd )
                {                    
                    if ( punctHead && punctEnd && strLength >= 2 )
                    {
                         //if both ends are a non-alphanumeric character, remove the first and last last characters. 
                        newStr = newStr.Substring( 1, strLength - 2 );
                    }
                    else if ( punctHead )
                    {
                         //if it stars with a non-alphanumeric character, remove the first character. 
                        newStr = newStr.Substring( 1 );
                    }
                    else
                    {
                         //if it ends with a non-alphanumeric character, remove the last character 
                        newStr = newStr.Substring( 0, strLength - 1 );
                    }
                }
            }

            return newStr;
        }
    }
}

Figures

Remove Non-alphanumeric Characters
Fig. Remove Starting And Ending Non-alphanumeric Characters

Note

Currently, only the first starting and the last ending non-alphanumeric characters are removed. The code can be modified to remove all of the leading and ending non-alphanumeric characters.


Skip Navigation LinksHome > Nlp Sample Code > Word Manipulation > Remove Non-alphanumeric Characters