Monday, July 21, 2008

NLS considerations in Import/Export - Frequently Asked Questions

Subject: NLS considerations in Import/Export - Frequently Asked Questions
Doc ID: Note:227332.1 Type: BULLETIN
Last Revision Date: 17-JUL-2008 Status: PUBLISHED


Globalization (NLS): NLS considerations in Import/Export - Frequently Asked Questions
-------------------------------------------------------------------------------------
For the main Globalization (NLS) FAQ please see:
Note 60134.1 Globalization (NLS) - Frequently Asked Questions

1. How does NLS affect import/export (exp/imp)?
2. How should NLS_LANG be set when using export?
3. How should NLS_LANG be set when using import?
4. How is import affected by the NLS_LANGUAGE and NLS_TERRITORY ?
5. I have the message "( possible ncharset conversion )" during import.
6. How to know in what characterset a dmp (export) file is created?
7. How does NLS affect datapump (expdp/impdp)?
8. What causes ORA-01401 or ORA-12899 during import (imp and impdp) ?


For the main exp/imp FAQ please see:
Note 175624.1 Oracle Server - Export and Import FAQ

Globalization (NLS): NLS considerations in Import/Export - Frequently Asked Questions
-------------------------------------------------------------------------------------
1. How does NLS affect import/export(exp/imp)??

Import and export are client products, in the same way as SQL*Plus or
Oracle Forms, and will therefore translate characters from the database
character set to that defined by NLS_LANG. The character set used for
the export will be stored in the export file and, when the file is imported,
the import will check the character set that was used. If it is different to
that defined by NLS_LANG at the import site, the characters will be
translated to the import character set and then, if necessary, to the
database character set.

References:
Note 15095.1 Export/Import and NLS Considerations
Note 48644.1 Identifying the Export Character Set

2. How should NLS_LANG be set when using export?

Oracle recommends to set the character set part of NLS_LANG environment parameter
to ALWAYS the same character set as the character set of the database you are exporting.

select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

That way no conversion will take place and the exportfile will be created
in the same character set as the original database and contain ALL data from
original database (even incorrectly stored data if that would be the case).
Even if the plan is to import this into a database with a different character set
later the conversion can be postponed until the import.

Note that this has no relation with the Operating system. If your have a
WE8MSWIN1252 database on a unix server (which is totally supported) then you
should set NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 before export.

During *interaction* with the database (= sqlplus) you need to configure
your *unix* client properly and that cannot be 1252 seen *unix* does not
has a 1252 characterset
Note 264157.1 The correct NLS_LANG setting in Unix Environments


3. How should NLS_LANG be set when using import?

If the source and target database have the same character set,
the character set part of the NLS_LANG should be set to that same character set
on both the export and the import.

Even if the character sets of the exporting and importing databases
are not the same the best (preferred) value to use for
the character set part of NLS_LANG on both export and import is still
the character set of the source database.

select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

Setting the NLS_LANG to the character set of the target database during import
is also correct as such, but import has some limitations when going to a multibyte
characterset (like UTF8), hence using the SOURCE NLS_CHARACTERSET during both imp and exp
session is simply the best option, avoiding any problems like
IMP-16 "Required character set conversion (type %lu to %lu) not supported".

So, the preferred place to do the conversion is between the import executable
and the target database.

Note that this has no relation with the Operating system. If your source
database is a WE8MSWIN1252 database then you simply should set NLS_LANG
to AMERICAN_AMERICA.WE8MSWIN1252 before import, even on a Unix server.

Note that during *interaction* with the database (= sqlplus) you need to
configure your *unix* client properly and that cannot be 1252 seen *unix*
does not has a 1252 characterset
Note 264157.1 The correct NLS_LANG setting in Unix Environments


4. Example: you want to go from an WE8MSWIN1252 to an UTF8 db:

(note that this is only the exp/imp example, if you want to migrate to UTF8
check Note 260192.1 for the full story )

1) double check the NLS_CHARACTERSET on the SOURCE database

select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET';

and export with the NLS_LANG set to AMERICAN_AMERICA.

In this case we want to create a export file containing WE8MSWIN1252 data.

(This is also the setting you want to use if you take an
export as backup)

on unix this is:
$ set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
$ export NLS_LANG
$ exp ....

on windows this is:

c:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
c:\>exp ....

2) import with the NLS_LANG set to American_america.WE8MSWIN1252 (= source NLS_CHARACTERSET)
into the new UTF8 db.

on unix this is:
$ set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
$ export NLS_LANG
$ imp ....

on windows this is:

c:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
c:\>imp ....


The conversion to UTF8 is done while inserting the data
in the UTF8 database by the imp connection.

We recommend to set the NLS_LANG explicit in the current shell for unix
(-> Note 131207.1 How to Set Unix Environment Variable )
or in the dos box used for the exp or imp tool on windows.
(-> "c:\>set NLS_LANG=AMERICAN_AMERICA.")

4. How is import affected by the NLS_LANGUAGE and NLS_TERRITORY ?

Not. Normally you use the AMERICAN_AMERICA default, but
if you imported with NLS_LANG set to FRENCH_FRANCE for example
then you will not have problems, even if the originating environment
used GERMAN_GERMANY or so.

5. I have the message "( possible ncharset conversion )" during import.

you see something similar to

Export file created by EXPORT:V08.01.07 via direct path
import done in WE8ISO8859P15 character set and AL16UTF16 NCHAR character set
export server uses WE8ISO8859P15 NCHAR character set (possible ncharset conversion)

in the import log, please see point 11 in
Note 276914.1 The National Character Set in Oracle 9i and 10g

6. How to know in what characterset a dmp (export) file is created?

simply issue: imp system/oracle@database show=yes file=test.dmp

the output gives you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
-> this is the current NLS_LANG value set in the environment
and the NCHAR characterset of the target database

import server uses WE8MSWIN1252 character set (possible charset conversion)
-> this is only shown if the NLS_LANG during this import session is different
from the target database characterset, so if you see 3 lines you might have problems :-)

export client uses UTF8 character set (possible charset conversion)
-> this is the characterset used during the export session and the
characterset used in the dmp file.

7. How does NLS affect datapump (expdp/impdp)?

Datapump does not use the NLS_LANG to do conversion between databases.
Conversion between 2 database charactersets is done purely based on the
NLS_CHARACTERSET ( or NLS_NCHAR_CHARACTTERSET for Nchar,Nvarchar and Nclob datatypes)
of the source and target database.

However, if you specify a parameter file then the NLS_LANG *is* used.
This is only important if you use non-English characters (e.g. for the
QUERY parameter) in the parameter file.

If you use non-English characters in the parameter file then the NLS_LANG
environment variable should be set (in the session where the Data Pump job
is started) to the correct encoding of the parameter file (!).

Note:
-----

There is a chance to have data corruption when going from a 8 bit
characterset to UTF8 or a other multibyte characterset on ALL 10g versions
(including 10.1.0.5 and 10.2.0.3) and 11.1.0.6.
Impdp may provoke data corruption unless you applied Patch 5874989.
This is caused by the impdp Bug 5874989.
The "old" exp/imp work fine. This problem is fixed in the upcoming
10.2.0.4 and 11.1.0.7 patchset.

All existing patches for this bug are found here:
http://updates.oracle.com/download/5874989.html

For 10.1.0.5, 10.2.0.2, 10.2.0.3 and 11.1.0.6 you can ask a backport
for Bug 5874989 if there is no patch yet for your version/platform.

For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later ( Note 276548.1 )
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later ( Note 342443.1 )

The patch is technically only needed on the impdp side, but if you use
expdp/impdp between different character sets we suggest to patch all your
systems.


8. What causes ORA-01401 or ORA-12899 during import (imp and impdp) ?

9i and lower gives ORA-01401: inserted value too large for column
10g and up gives ORA-12899: value too large for column

This is seen when exporting from a database with a 8 bit NLS_CHARACTERSET
(like WE8ISO8859P1, WE8MSWIN1252 , WE8DEC ...) or 16 bit NLS_CHARACTERSET
(like JA16SJIS , ZHS16GBK, KO16MSWIN949) to a database with a
NLS_CHARACTERSET set to AL32UTF8 or UTF8

(AL32)UTF8 uses more BYTES to store a character then a 8 and 16 bit charactersets.
This is explained in Note 119119.1 AL32UTF8 / UTF8 (Unicode) Database Character Set
Implications

CHAR and VARCHAR2 colum sizes are defined by default in BYTES not characterset.

The best solution is to pre-create the tables using CHAR semantics (9i and up)
Note 144808.1 Examples and limits of BYTE and CHAR semantics usage


Other Known Problems:
---------------------
Note 278980.1 Oracle10g Import Errors like IMP-00008 / IMP-00009 if Eported with NLS_LANG=FRENCH_FRANCE.[characterset]


References
----------

For further NLS / Globalization information you may start here:
Note 267942.1 - Globalization Technology (NLS) Knowledge Browser Product Page
Note 60134.1 Globalization (NLS) - Frequently Asked Questions

Further details of issues discussed in this note can be found in:
Note 15095.1 Export/Import and NLS Considerations
Note 48644.1 Identifying the Export Character Set

Note 175624.1 Oracle Server - Export and Import FAQ

No comments: