Monday, July 21, 2008

Export/Import and NLS Considerations

Subject: Export/Import and NLS Considerations
Doc ID: Note:15095.1 Type: FAQ
Last Revision Date: 07-JUL-2006 Status: PUBLISHED


Introduction
------------
This note provides an in-depth overview of exp/imp and NLS, you may not need to
know all this if you just want to take a export and make sure you don't lose any
data. Please read this first:
Note 227332.1 NLS considerations in Import/Export - Frequently Asked Questions


Export/Import and NLS considerations
------------------------------------
If you have exported/imported a database or table(s) and are now encountering
character set conversion problems, use the following information to confirm
whether the export/import procedure was performed correctly.

+ In Oracle9 and higher, most data is automaticaly exported in the character set
of the database that is exported. The only exception to that, is that
data-dictionary information is exported in the character set that is set in
the NLS_LANG environment variable when making the export.
In pre-Oracle9 export, ALL data is exported in the character set that is set
in the NLS_LANG environment variable when making the export.
This means that if the character set part of the NLS_LANG environment variable
during export is different than the database character set, there will be a
automatic conversion between those (for 9i and higher only for data-dictionary
data since "normal" data is exported in the database character set, regardless
of the NLS_LANG, as described above).

+ The export file contains the character set that the data was exported in, in
the first few bytes of the export file. Export stores the character set ID
(not the text string) of the "exporting character set". See Note 48644.1.
This is relevant in pre-Oracle9 export files because all data will be exported
in the same character set and this way we can "see" what that character set
is, and we can even "update" these fields to work around certain problems
(only when advised by Oracle Support!).
For Oracle9 and higher export files this is less relevant because the
character set is stored in many more places in the export file, and can be
different depending on the data. So for Oracle9 and higher we can never
"update" the export file manualy.

+ An import session runs in the character set that is set in the NLS_LANG
environment variable for import session.

+ Import will check the character set of the data in the dump file and compare
it with the session's character set as defined in NLS_LANG.

+ No conversion occurs if the export data character set and the import session
character set are the same. If they are not the same, conversion is performed
from the export data character set to the import session character set prior
to the data being inserted into the database.
However, import can only perform this conversion for single-byte character
sets !

+ For imports into multibyte character set databases (i.e. UTF8) the character
set part of NLS_LANG should be set identical to that of the export session.
Otherwise an
IMP-16 "Required character set conversion (type %lu to %lu) not supported"
error will come up during import. See also Bug 896407.

+ The import session character set should be a the same, or a superset of the
export data character set, otherwise special characters will not be correctly
converted.

+ If you import pre-oracle7 export files include the parameter 'CHARSET' when
defining the import parameter set. CHARSET identifies the character set of the
export file. The CHARSET option was developed to import older export files
which did not have stored character set ID information.
If you define CHARSET but the export file does contain character set
information (export version 7 and higher) then the value in CHARSET should
match the export file character set. If they do not match, IMP-42 will
result (so it's better not to use this parameter at all, unless you import
pre-oracle7 data).

+ After the data has been converted to the import session character set, it is
then converted to the database character set if they differ. The database
character set should be a superset (or the same) of the import's session
character set otherwise special characters will not be correctly converted.

So this means that there is a potential for 3 seperate conversions when moving
data from between databases, as described in the following graph:

------------------- for all data pre-oracle9 and data-
| db in character | export dictionary data in oracle9 and up:
| set A |---------------------> exp session is in character
------------------- set B as defined by NLS_LANG.
source Therefore the dump file is in
character set B. Character set
conversion may occur.
|
|
| move file over to
| another machine
|
V
destination
-------------------
| db in character | import imp session is in character
| set D | <------------------- set C as defined by NLS_LANG.
------------------- The dump file is still in
character set B. Character set
conversion may occur.
During the import process
character set conversion
may occur between character
set C and the db's character
set D if they differ.


If you have any problems related to character set conversion in export/import
then you need to identify the following:
-- What is (was) the database character set in the source database (character
set A in the above)?
-- What was the client character set specified in NLS_LANG when the data was
exported (character set B in the above)?
-- What was the client character set specified in NLS_LANG when the data was
imported (character set C in the above) ?
-- What is the database character set of the destination database (character
set D in the above)?


Minimizing character set conversions during export/import
---------------------------------------------------------
As described, it is important to note import will do up to 3 character set
conversions depending on:

(a) character set of exported database
(b) NLS_LANG of export session (-> this is the "export file character set")
(c) NLS_LANG of import session
(d) character set of taget database.

Obviously there really is only the need for 1 conversion only (from original
database character set to target database character set). Minimizing the number
of conversions means that you minimize the potential of "loosing" data.

To minimize the number of conversions you are advised to follow these rules:
+ Set the NLS_LANG during export to the same as the character set of the
exported database -> this means no conversion takes place, all data is still
stored in the export file as it was stored in the database.
+ Set the NLS_LANG during import to the same value as during the export -> this
means that no conversion takes place in the import session.
+ If the character set of the target database is different, then the data will
be automatically converted when import inserts the data into the database, you
do not have to "set" anything for this, it's automatic.

Of course you do not have to follow these rules, but complying with these rules
does minimize the risk of loosing data during export/import.


Regarding the National Characterset:
------------------------------------
NCLOBs are always exported/imported in UCS-2/AL16UTF16.
NCHAR/NVARCHAR2s are always exported in the database's national character set.
This is something you can't influence by setting any parameters.

No comments: