Singularization / Pluralization in C#

Singularize: to make (a word, etc.) singular

Recently I needed to singularize words in C# – specifically, I needed to singularize table names from Northwind in an automatic fashion. There are like… eight tables, but who wants to do that by hand?

I was able to locate code for pluralizing words, by nothing for singularization. So I converted the pluralization code and made my own singularization class.

Pluralizing Apple makes it Apples.

Singularizing Apples makes it Apple.

If you need plurals, use this the pluralization code located here, if you need singulars, use this my singularization code;

using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Pingularizer
{
    public class Singularizer
    {
        private static readonly IList<string> Unpluralizables =
            new List<string>
                {
                    "equipment",
                    "information",
                    "rice",
                    "money",
                    "species",
                    "series",
                    "fish",
                    "sheep",
                    "deer"
                };

        private static readonly IDictionary<string, string> Singularizations =
            new Dictionary<string, string>
                {
                    // Start with the rarest cases, and move to the most common
                    {"people", "person"},
                    {"oxen", "ox"},
                    {"children", "child"},
                    {"feet", "foot"},
                    {"teeth", "tooth"},
                    {"geese", "goose"},
                    // And now the more standard rules.
                    {"(.*)ives?", "$1ife"},
                    {"(.*)ves?", "$1f"},
                    // ie, wolf, wife
                    {"(.*)men$", "$1man"},
                    {"(.+[aeiou])ys$", "$1y"},
                    {"(.+[^aeiou])ies$", "$1y"},
                    {"(.+)zes$", "$1"},
                    {"([m|l])ice$", "$1ouse"},
                    {"matrices", @"matrix"},
                    {"indices", @"index"},
                    {"(.+[^aeiou])ices$","$1ice"},
                    {"(.*)ices", @"$1ex"},
                    // ie, Matrix, Index
                    {"(octop|vir)i$", "$1us"},
                    {"(.+(s|x|sh|ch))es$", @"$1"},
                    {"(.+)s", @"$1"}
                };

        public static string Singularize(string word)
        {
            if (Unpluralizables.Contains(word.ToLowerInvariant()))
            {
                return word;
            }

            foreach (var singularization in Singularizations)
            {
                if (Regex.IsMatch(word, singularization.Key))
                {
                    return Regex.Replace(word, singularization.Key, singularization.Value);
                }
            }

            return word;
        }

        public static bool IsPlural(string word)
        {
            if (Unpluralizables.Contains(word.ToLowerInvariant()))
            {
                return true;
            }

            foreach (var singularization in Singularizations)
            {
                if (Regex.IsMatch(word, singularization.Key))
                {
                    return true;
                }
            }

            return false;
        }
    }
}

And here are a number of unit tests;

using System.Collections.Generic;
using NUnit.Framework;

namespace Pingularizer
{
    [TestFixture]
    public class SingularizerTests
    {
        [Test]
        public void StandardSingularizationTests()
        {
            Dictionary<string, string> dictionary = GetTestDictionary();

            foreach (var singular in dictionary.Keys)
            {
                var plural = dictionary[singular];
                Assert.AreEqual(singular, Singularizer.Singularize(plural));
            }
        }

        [Test]
        public void IrregularSingularizationTests()
        {
            var dictionary = new Dictionary<string, string>();
            dictionary.Add("person", "people");
            dictionary.Add("child", "children");
            dictionary.Add("ox", "oxen");

            foreach (var singular in dictionary.Keys)
            {
                var plural = dictionary[singular];
                Assert.AreEqual(singular, Singularizer.Singularize(plural));
            }
        }

        [Test]
        public void NonSingularizationPluralizationTests()
        {
            var nonPluralizingWords = new List<string> { "equipment", "information", "rice", "money", "species", "series", "fish", "sheep", "deer" };

            foreach (var word in nonPluralizingWords)
            {
                Assert.AreEqual(word, Singularizer.Singularize(word));
            }
        }

        private Dictionary<string, string> GetTestDictionary()
        {
            Dictionary<string, string> dictionary = new Dictionary<string, string>();
            dictionary.Add("sausage", "sausages"); // Most words - Just add an 's'
            dictionary.Add("status", "statuses"); // Words that end in 's' - Add 'es'
            dictionary.Add("ax", "axes"); // Words that end in 'x' - Add 'es'
            dictionary.Add("octopus", "octopi"); // Some Words that end in 'us' - Replace 'us' with 'i'
            dictionary.Add("virus", "viri"); // Some Words that end in 'us' - Replace 'us' with 'i'
            dictionary.Add("crush", "crushes"); // Words that end in 'sh' - Add 'es'
            dictionary.Add("crutch", "crutches"); // Words that end in 'ch' - Add 'es'
            dictionary.Add("matrix", "matrices"); // Words that end in 'ix' - Replace with 'ices'
            dictionary.Add("index", "indices"); // Words that end in 'ex' - Replace with 'ices'
            dictionary.Add("mouse", "mice"); // Some Words that end in 'ouse' - Replace with 'ice'
            dictionary.Add("quiz", "quizzes"); // Words that end in 'z' - Add 'zes'
            dictionary.Add("mailman", "mailmen"); // Words that end in 'man' - Replace with 'men'
            dictionary.Add("man", "men"); // Words that end in 'man' - Replace with 'men'
            dictionary.Add("wolf", "wolves"); // Words that end in 'f' - Replace with 'ves'
            dictionary.Add("wife", "wives"); // Words that end in 'fe' - Replace with 'ves'
            dictionary.Add("day", "days"); // Words that end in '[vowel]y' - Replace with 'ys'
            dictionary.Add("sky", "skies"); // Words that end in '[consonant]y' - Replace with 'ies'
            return dictionary;
        }
    }
}

About mfagerlund
Writes code in my sleep - and sometimes it even compiles!

7 Responses to Singularization / Pluralization in C#

  1. Matt Grande says:

    Great work, and thanks for using my pluralizer for inspiration. I’ve added your code to my project as well!

  2. Pingback: (Silverlight) Switching between LINQtoSQL and Entity Framework « Mattias Fagerlund's Coding Blog

  3. chris says:

    Nice. One suggestion though: consider 1 of the two options below.

    1)Give the method a more accurate name. Since the method will return string.empty when Regex.IsMatch evaluates to false, Singularize isn’t entirely accurate b/c it will return string.empty when a singular is passed in. If you prefer to keep the name as is then throw an exception when the caller breaks the contract defined in the sig signature by passing in something other than a plural.

    2). Make the method’s behavior consisent with it’s name. 1st change the parameter name to word. Then if Regex.IsMatch returns false — return the original word.

    Personally, I like option 2 best.

  4. Mark Renouf says:

    Fix for “prices”, “slices”, etc… without you get “slex”, “prex”.

    replace(“(.+[^aeiou])ices$”).with(“$1ice”),

    This is from my Java version I made from your examples. I hope you don’t mind the reuse (I assumed that’s why you posted it on your blog :-))

    https://gist.github.com/805745 (Code)
    https://gist.github.com/805746 (Unit Test)

  5. Jamie says:

    Thanks. This was helpful.

    There are a few problem words such as “Versions”, “Universals”, and “Events”. The fixes were to change the “?” to “$” in two lines {“(.*)ives$”, “$1ife”}, {“(.*)ves$”, “$1f”}, and to add a $ to the last one {“(.+)s$”, @”$1″}.

  6. Pingback: Is there any way to convert English verbs to it's present tense on C# - Popular Windows Phone Questions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: