Zurück

Using SQL Loader with an UTF8 Characterset Database

You never had any troubles with charactersets using Oracle’s SQL Loader utility?
If you accept as true then enjoy the day or find another interesting tip on our site.

Situation

We had erroneous data after loading with SQL Loader.

Settings

Database created with

CHARACTER SET "UTF8"
NATIONAL CHARACTER SET "WE8ISO8859P1"
NLS_LANGUAGE = AMERICAN
NLS_TERRITORY = AMERICA

Example

Create a test table with

CONNECT scott/tiger;
DROP TABLE LOADER_TEST;
CREATE TABLE LOADER_TEST (
   USR_ID          NUMBER NOT NULL ,
   USR_NAME        VARCHAR2(50) ,
   USR_LNK_NAME    VARCHAR2(50) ,
   USR_LNK_ORDER   NUMBER
);

  • Case 1

Client (Win 2000)

NLS_LANG=AMERICAN_AMERICA.UTF8

Controlfile (ctrl.txt)

LOAD DATA
CHARACTERSET UTF8
INFILE *
REPLACE INTO TABLE LOADER_TEST
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS (
   USR_ID         INTEGER EXTERNAL,
   USR_NAME       CHAR(50),
   USR_LNK_NAME   CHAR(50),
   USR_LNK_ORDER  INTEGER EXTERNAL
)
BEGINDATA
1;Santé bien;http://www.vaud-sante.ch;3
2;Santé;http://www.vaud-sante.ch;4
3;Alle à gessa;http://www.gessa.com/now;2
4;Alle à;http://www.gessa.com/now;1
5;Gägs;http://www.gaegs.ch;5
6;Gägs ä;http://www.gaegs.ch;6

Invoke sqlldr

sqlldr userid=scott/tiger control=ctrl.txt

Result

column usr_id format 99 heading "Id"
column USR_NAME format a32 heading "Name" 
column USR_LNK_NAME format a26 heading "Link"
column USR_LNK_ORDER format 99 heading "Ord"
select * from loader_test;

Id Name                             Link                      Ord
-- -------------------------------- ------------------------- ---
 1 Santé bien                       http://www.vaud-sante.ch    3
 2 Santé;http://www.vaud-sante.ch   4
 3 Alle à gessa                     http://www.gessa.com/now    2
 4 Alle à;http://www.gessa.com/now  1
 5 Gägs                             http://www.gaegs.ch         5
 6 Gägs ä;http://www.gaegs.ch        6

For the records 2,4,6 where a special character is prior to the delimiter, the SQL Loader does not see the delimiter.

  • Case 2

Change the control file (ctrl.txt) to

CHARACTERSET WE8ISO8859P1

Reload the data and select the LOADER_TEST table again

Id Name                             Link                      Ord
-- -------------------------------- ------------------------- ---
 1 Santé bien                      http://www.vaud-sante.ch    3
 2 Santé                            http://www.vaud-sante.ch    4
 3 Alle à gessa                    http://www.gessa.com/now    2
 4 Alle à                          http://www.gessa.com/now    1
 5 Gägs                             http://www.gaegs.ch         5
 6 Gägs ä                         http://www.gaegs.ch         6

Now, SQL Loader recognises all delimiters, but data seems to be corrupted.

  • Case 3

Change the client environment to

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

Reselect data - now the result is now correct !

 Id Name                           Link                       Ord
--- ------------------------------ -------------------------- ---
  1 Santé bien                     http://www.vaud-sante.ch     3
  2 Santé                          http://www.vaud-sante.ch     4
  3 Alle à gessa                   http://www.gessa.com/now     2
  4 Alle à                         http://www.gessa.com/now     1
  5 Gägs                           http://www.gaegs.ch          5
  6 Gägs ä                         http://www.gaegs.ch          6

Conclusion

Running client tools (e.g. SQL Loader, SQL Plus) against a database with UTF8 character set, you must assign WE8ISO8859P1 to NLS_LANG for the client tools due to prober data representation. That means a correct characterset conversion will take place within SQL Net and Net8 respectively.