UFS stands for the Unix File System. It descended from the Fast File System developed by Kirk McCusick in the groundbreaking Berkeley 4.2 system.
Unfortunately, the split between BSD and Solaris flavors of UFS happened in the 1980's, and they are now incompatible and mutually incomprehensible by their respective systems. Mac OS X's UFS is also slightly different, but probably closer to BSD's variation. Linux can often mount either flavor, but read-only.
Unicode is based on the Ascii character code, which most of the world's computers use as bread and butter. Using 8 bit bytes, there are 256 different codes you can have, for 256 different characters. The first 128 are more or less standardized as you see on your keyboard, but the last 128 are encoded differently on Windows, Macintosh and Unix systems, old systems and new systems, in Western languages, Cyrillic (Russian) languages, Japanese, etc.
It's a mess. The bottom line is that there's too many languages, and too many characters, to fit them all into 256 slots.
Unicode uses 16 bit characters, for a total space of 65,536 different characters. The first 128 are the same as in Ascii, which makes things easier. The next block of 128 have accented characters as in ISO Latin-1, such as é, î, ü, and so on. The next 1500 characters cover most of the languages of Eastern Europe, plus Greek, Cyrillic, Hebrew and Arabic. Nine alphabets of India, several Southeast Asian and Caucasus alphabets, plus codes for Japanese, Chinese and Korean, cover the rest of the world. (Languages that are excluded are usually confined to small, underdeveloped enclaves in third world countries, or else are used by historians, linguists and anthropologists for extinct languages.)
Unicode also has a huge number of punctuation marks and special symbols, including 19 currency symbols, all of Adobe Symbol font and Zapf Dingbats font, fractions such as 5/6, about 90 arrow symbols, circled letters and numbers, box characters, chess pieces and zodiac signs.
Unicode characters are referred to by their hex code, for instance, single curly quotes are codes U+2018 and U+2019. Here is what 'Sarah' would look like in Unicode, with quotes (compare with Ascii example):
characters: |
` |
S |
a |
r |
a |
h |
' |
words: |
2018 | 0053 | 0061 | 0072 | 0061 | 0068 | 2019 |
Unicode is used internally as the main character code in Java (and also in Apple's ill fated Newton). Adoption in existing computer systems is slow because it is such a fundamental change, but we're getting there. Many of the newer filesystems use Unicode to encode filenames.
Since the world has been using Ascii for so long, files and the internet generally work on 8 bit bytes. The preferred method in the West for encoding Unicode into byte files is UTF-8. If you stick to straight Ascii (the first 128 characters as on your keyboard), UTF-8 files are the same as text files for Mac, Windows and Unix (give or take the newline problem).
Unicode characters beyond 128 turn into 2 byte or 3 byte sequences in UTF-8. Converting between bytestream UTF-8 and sixteen bit Unicode (known as UCS-2) inside of a computer program is simple and straightforward and can be done in a few dozen lines of code. Converting between Unicode and existing 8-bit encodings such as ISO or Macintosh or Shift-JIS usually require some sort of lookup table. Free source is available from the Unicode Consortium.
For more information on Unicode, visit the Unicode Consortium.
"Usage" also sometimes refers to a small help message that command-line programs print when you type in the wrong syntax.
Documentation >
Glossary >
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0123456789 punctuation |
|
||||