Oracle, Life, etc.

Oracle, Life, etc.

Archive for September 11th, 2006

Chapter 21 – Managing Globalization in Oracle Databases

Posted by mnsinger on September 11, 2006

  • NLS stands for National Language Support but it entails much more than just language.

Globalization Capabilities

Character Sets

  • Some applications (MS Word) use the character sets provided with the OS, while Oracle provides its own character sets which means that it is the same on all platforms.
  • Single-byte character sets use only one byte (8 bits or less) per character.
  • Multi-byte character sets use one, two or three bytes per character.
  • Fixed width character sets use the same number of bytes for each character.
  • Varying width character sets use a different number of bytes per character.
  • Single-byte seven-bit – uses a single byte per character but only 7 of the 8 bits per character
    • 2^7 total possible characters (128)
    • Usually inadequate
    • Oracle default (US7ASCII)
  • Single-byte eight-bit – uses a single byte per character and 8 bits per character
    • 2^8 total possible characters (256)
    • Adequate for Western European languages but perhaps not Eastern Europe and definitely not Asian languages (ex. WE8ISO8859P15)
  • Fixed-width multibyte
    • Unicode (AL16UTF16)
  • Varying-width single-byte
    • Characters may be from one to four bytes
    • Unicode (UTF8)
  • Varying-width multibyte
    • May use shift-out code to indicate that the following bytes are double-byte characters and shift-in code to indicate single-byte characters.

Language Support

  • The number of languages depends on platform, release & patch level.
  • The view V$NLS_VALID_VALUES shows all possible languages (American, German, French, Canadian French, etc.)
  • The parameters NLS_LANGUAGE, NLS_DATE_LANGUAGE & NLS_SORT will determine error message language, date language & sort order, respectively.
    • SQL> select * from v$nls_valid_values where parameter=’SORT’;
  • BINARY is the default value for NLS_SORT and is poor for any language besides English. It simply sorts by the ASCII numeric value of the character set.

Territory Support

  • The territory selected sets a number of parameter defaults for day and week numbering, credit and debit symbols, date formats (nls_date_format), decimal and group numeric separators (nls_numeric_characters), and currency symbols (nls_currency).

Globalization Support Features

  • Can be specified at any and all of five levels:
    • Database
    • Instance
    • Client environment
    • Session
    • Statement

Choosing a Character Set

  • The character set is used to store data in columns of type: VARCHAR2, CLOB, CHAR and LONG. Changing the character set can destroy data in these columns.
  • The National Character Set is a secondary character set specified at database creation time. It is used for columns of datatype: NVARCHAR2, NCLOB and NCHAR.
    • Must be either AL16UTF16 or UTF8. Other unicode sets are not supported.
  • The only limitation on the database character set is that it must have either US7ASCII or EBCDIC as a subset since SQL and PL/SQL are stored in these char sets.
  • The default character set is US7ASCII and the default national character set is AL16UTF16 (must be AL16UTF16 or UTF8).

Changing Character Sets

  • Two tools exist:
    • Database Character Set Scanner (csscan)
      • csscan system/passwd full=y tochar=utf8
      • checks database to see if problems may arise
      • one possible problem is that the new char set may require more bytes and therefore the new data won’t fit in the column
      • the output will show every row that will have problems with the new char set
    • Language and Character Set File Scanner (lcsscan)
      • checks text files to see what language the data is in
      • must be a straight text file
  • To change Character Sets of the DB
    • SQL> ALTER DATABASE CHARACTER SET …
    • SQL> ALTER DATABASE NATIONAL CHARACTER SET …
    • The new Character Set must be a superset of the old set but this does not guarantee no corruptions

Globalization Within the Database

  • SQL> select * from nls_database_parameters;

Globalization at the Instance Level

  • Instance parameter settings will override the database settings.
  • NLS_INSTANCE_PARAMETERS show the settings just like NLS_DATABASE_PARAMETERS

Client-Side Environment Settings

  • C:\> set nls_lang=Hungarian
  • It may be dangerous for clients to set their own NLS settings.
    • Conversion may not be possible: Asian -> US7ASCII
  • The conversion between client-side and server-side is done by Oracle Net.

Session-Level Globalization Settings

  • A user can set their NLS settings in two ways:
    • SQL> alter session set nls_date_format=’dd.mm.yyyy’;
    • SQL> execute dbms_session.set_nls(‘nls_date_format’,”‘dd.mm.yyyy’”);

Statement Globalization Settings

  • The tightest level of control over Globalization is to manage it from within each statement.
    • SQL> select to_char(hiredate,’Day dd, Month YYYY’,'nls_date_language=dutch’), to_char(hiredate, ‘Day dd, Month YYYY’,nls_date_language=german) from scott.emp;
  • Statement conversion choices:
    • to_date – nls_date_language, nls_calendar
    • to_number – nls_numeric_characters, nls_currency, nls_dual_currency, nls_iso_currency, nls_calendar
    • to_char, to_nchar – nls_date_language, nls_numeric_characters, nls_currency, nls_dual_currency, nls_iso_currency, nls_calendar

Linguistic Sorting and Selection

  • Oracle’s default sort order is binary, which converts characters to their ASCII codes and sorts them accordingly. This is suitable for American english but not for other languages.
  • Linguistic sorting means that Oracle will replace each character with a numeric value that represents its sorting value for the chosen language.
  • All nls_sort options can be suffixed with _AI or _CI for accent insensitive or case insensitive
    • SQL> alter session set nls_sort=FRENCH_CI;

The Locale Builder

  • A graphical tool to create customized Globalization environments
    • $ORACLE_HOME/nls/lbuilder/lbuilder

Using Time Zones

  • The Oracle environment can be made time zone aware by using datatypes:
    • TIMESTAMP WITH TIME ZONE – stores the timestamp and time zone it refers to
    • TIMESTAMP WITH LOCAL TIME ZONE – timestamp is converted to db and converted to user’s local time zone on retrieval
  • DATE and TIMESTAMP are always normalized to the database time zone and displayed unchanged when retrieved

Parameters

Language Support

  • NLS_LANGUAGE – defaults to AMERICAN
  • NLS_DATE_LANGUAGE – defaults to AMERICAN
  • NLS_SORT – defaults to BINARY which is a poor choice for many languages

Territory Support

  • NLS_TERRITORY – defaults to AMERICA
  • NLS_CURRENCY – defaults to $
  • NLS_DUAL_CURRENCY – defaults to $ (secondary currency for territory)
  • NLS_ISO_CURRENCY – defaults to AMERICA
  • NLS_DATE_FORMAT – defaults to DD-MM-RR
  • NLS_NUMERIC_CHARACTERS – defaults to ., (decimal and comma)
  • NLS_TIMESTAMP_FORMAT – defaults to DD-MM-RRHH.MI.SSXFF AM (for columns of type timestamp)
  • NLS_TIMESTAMP_TZ_FORMAT – defaults to DD-MM-RRHH.MI.SSXFF AM TZR (for columns of type TIMESTAMP WITH LOCAL TIMEZONE)

Other NLS Settings

  • NLS_CALENDAR – defaults to Gregorian
  • NLS_COMP – defaults to BINARY (comparison of letters by binary value)
  • NLS_LENGTH_SEMANTICS – defaults to BYTE (column length specified in bytes – not chars)
  • NLS_NCHAR_CONV_EXCP – defaults to false (won’t show error messages when conversions between different types fail – VARCHAR2 -> NVARCHAR)

Views & Tables

  • V$NLS_VALID_VALUES
  • V$NLS_PARAMETERS – Globalization settings currently in effect for your session
  • V$TIMEZONE_NAMES -list of supported time zones
  • NLS_DATABASE_PARAMETERS – shows current values of all NLS settings at db level
  • NLS_INSTANCE_PARAMETERS – shows current values of all NLS settings at instance level
  • NLS_SESSION_PARAMETERS – shows current values of all NLS settings at session level

Posted in Oracle | 1 Comment »