Have you ever seen a bunch of ^M characters in a text file? This odd character at the end of a line can also be represented as a Ctrl+M or <CTRL>M. You don’t know what it is, and you want it to go away.
Today, I’ll help you understand what that odd ^M character is, why it is in some of your documents, and how to get rid of them.
All About the Newline
In text documents, lines are separated by what is called a newline (also known as a line break or end of line). Different operating systems have different character codes that represent this newline.
- DOS-based systems, including Windows, as well as a number of other, older non-Unix OSes use a carriage return (CR) character followed by a line feed (LF) character.
- Commodore and the old Apple OSes before OS X used a CR character. Since OS X+ is based partly on BSD, which is in turn based on Unix, the new Apple OSes use the Unix newline method described below.
- Unix and its derivatives (Linux, BSD, and others) all use the single LF character to represent a newline.
So what does all of this mean to you? It means that text documents that come from a Windows system won’t always play nice in Linux. The converse is true however. If you create a text file in Linux, many programs will fail to recognize the single LF as a newline and will render the document without any line breaks.
dos2unix and unix2dos to the Rescue
Fortunately, there are a couple of very easy to use programs that make dealing with this file format mess much easier. They are dos2unix and unix2dos.
These programs basically do exactly what their name implies: dos2unix takes a file and converts all DOS-style newlines to Unix-style newlines. unix2dos takes a file and converts all Unix-style newlines to DOS-stlye newlines.
I put Ubuntu and CentOS in the title because I’m going to give instructions for installing these programs on each of these distros. Why just these two? They are the two that I work with most often and are representative of the lion’s share of what people are using these days. If you need help with a different distro, please let me know in a comment.
Installing dos2unix and unix2dos in Ubuntu
There aren’t any dos2unix or unix2dos packages that can be found in Synaptic; howver, there is a packages that will install them for you. Simply open up Synaptic and install the tofrodos package. If you are like me and prefer to do this from the command line, you can run the following command:
That’s all there is to it. Not only will dos2unix and unix2dos install, but alias programs fromdos and todos will be installed as well. These additional programs work in the same manner, so it’s purely a matter of preference which ones you use.
Installing dos2unix and unix2dos in CentOS
I really thought that I had to install these in CentOS, but amazingly, the programs are already installed by default. I tested this in a Virtual Machine fresh install, and the programs were there on the first boot.
So, CentOS users, you’re already good to go.
Fortunately, using these programs couldn’t be easier.
Let’s say that you are in a Terminal (Applications > Accessories > Terminal) viewing text files. Maybe you just downloaded a new WordPress plugin and you are reading the readme.txt file. It doesn’t really matter. However, there is a problem. The readme.txt file has a bunch of ^M characters at the end of each line, and it’s really distracting.
Simply exit out of the editor you are currently in, since the file will be modified, and run the following command:
If there are multiple files, you can specify each one with a space separating each. For example:
The program doesn’t produce any output. Simply reopen your text file and look at all the beautiful non-existant ^M’s.
The unix2dos program has the exact same syntax as dos2unix. However, I thought it might be helpful to describe a situation in which you might need to use it.
You’re working on a project. You’ve just sent out your batch of files, and another member on the project complains that you are being a jerk and remove all the newlines in your text files. This other member is most likely using a Windows application that doesn’t understand the Unix newline format. Rather than getting into a format war, it’s typically better and quicker to simply convert your text files to a DOS/Windows format.
If you have a folder full of files that all need to be converted, simply run:
Now you can send these new files to your project group and hopefully avoid any more unproductive drama.
File Formats in Vi
If you happen to use Vi, you can change Vi back and forth between DOS and Unix modes for newlines with a simple command. “:set ff=dos” sets the editor to use DOS newline encoding and will save the file in a DOS-encoded format. “:set ff=unix” sets the editor use Unix newlines and will save the file in a Unix format.
Note that changing the format to dos from unix will always work as expected. This is because any file that contains just LF characters for new lines will be converted to CRLF while lines that already end in CRLF will be left as is.
If your Vi config defaults to a unix format and you open a DOS file, you will see the ^M characters. You can either use the dos2unix conversion utility first or change Vi first to the dos format and then to unix.
If you’d like Vi to default to DOS or Unix formating each time you start a new Vi session, you can add the setting to your ~/.vimrc file. In that file, either add “set ff=dos” or “set ff=unix” depending on your needs. Note the lack of the colon, :, in the .vimrc entries.
For more information on the ff or fileformat setting in Vi, check out the official documentation.
Maybe one day we won’t have to worry about these types of things. For now however, it’s good to know the tools that make these problems easily manageable.
BTW, Happy Daylight Savings Day everyone. I hope you enjoyed the loss of an hour of sleep. 🙂
Did I help you?