CINXE.COM
encoding - allows you to write your script in non-ascii or non-utf8 - Perldoc Browser
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>encoding - allows you to write your script in non-ascii or non-utf8 - Perldoc Browser</title> <link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Perldoc Browser"> <link rel="canonical" href="https://perldoc.perl.org/encoding"> <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" integrity="sha384-JcKb8q3iqJ61gNV9KGb8thSsNjpSL0n8PARn9HuZOnIxN0hoP+VmmDGMN5t9UJ0Z" crossorigin="anonymous"> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.5.0/styles/stackoverflow-light.min.css" integrity="sha512-cG1IdFxqipi3gqLmksLtuk13C+hBa57a6zpWxMeoY3Q9O6ooFxq50DayCdm0QrDgZjMUn23z/0PMZlgft7Yp5Q==" crossorigin="anonymous" /> <style> body { background: #f4f4f5; color: #020202; } .navbar-dark { background-image: -webkit-linear-gradient(top, #005f85 0, #002e49 100%); background-image: -o-linear-gradient(top, #005f85 0, #002e49 100%); background-image: linear-gradient(to bottom, #005f85 0, #002e49 100%); filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#ff005f85', endColorstr='#ff002e49', GradientType=0); background-repeat: repeat-x; } .navbar-dark .navbar-nav .nav-link, .navbar-dark .navbar-nav .nav-link:focus { color: #fff } .navbar-dark .navbar-nav .nav-link:hover { color: #ffef68 } #wrapperlicious { margin: 0 auto; font: 0.9em 'Helvetica Neue', Helvetica, sans-serif; font-weight: normal; line-height: 1.5em; margin: 0; padding: 0; } #wrapperlicious h1 { font-size: 1.5em } #wrapperlicious h2 { font-size: 1.3em } #wrapperlicious h3 { font-size: 1.1em } #wrapperlicious h4 { font-size: 0.9em } #wrapperlicious h1, #wrapperlicious h2, #wrapperlicious h3, #wrapperlicious h4, #wrapperlicious dt { color: #020202; margin-top: 1em; margin-bottom: 1em; position: relative; font-weight: bold; } #wrapperlicious a { color: inherit; text-decoration: underline } #wrapperlicious #toc { text-decoration: none } #wrapperlicious a:hover { color: #2a2a2a } #wrapperlicious a img { border: 0 } #wrapperlicious :not(pre) > code { color: inherit; background-color: rgba(0, 0, 0, 0.04); border-radius: 3px; font: 0.9em Consolas, Menlo, Monaco, monospace; padding: 0.3em; } #wrapperlicious dd { margin: 0; margin-left: 2em; } #wrapperlicious dt { color: #2a2a2a; font-weight: bold; margin-left: 0.9em; } #wrapperlicious p { margin-bottom: 1em; margin-top: 1em; } #wrapperlicious li > p { margin-bottom: 0; margin-top: 0; } #wrapperlicious pre { border: 1px solid #c1c1c1; border-radius: 3px; font: 100% Consolas, Menlo, Monaco, monospace; margin-bottom: 1em; margin-top: 1em; } #wrapperlicious pre > code { display: block; background-color: #f6f6f6; font: 0.9em Consolas, Menlo, Monaco, monospace; line-height: 1.5em; text-align: left; white-space: pre; padding: 1em; } #wrapperlicious dl, #wrapperlicious ol, #wrapperlicious ul { margin-bottom: 1em; margin-top: 1em; } #wrapperlicious ul { list-style-type: square; } #wrapperlicious ul ul { margin-bottom: 0px; margin-top: 0px; } #footer { font-size: 0.8em; padding-top: 0.5em; text-align: center; } #more { display: inline; font-size: 0.8em; } #perldocdiv { background-color: #fff; border: 1px solid #c1c1c1; border-bottom-left-radius: 5px; border-bottom-right-radius: 5px; margin-left: auto; margin-right: auto; padding: 3em; padding-top: 1em; max-width: 960px; } #moduleversion { float: right } #wrapperlicious .leading-notice { font-style: italic; padding-left: 1em; margin-top: 1em; margin-bottom: 1em; } #wrapperlicious .permalink { display: none; left: -0.75em; position: absolute; padding-right: 0.25em; text-decoration: none; } #wrapperlicious h1:hover .permalink, #wrapperlicious h2:hover .permalink, #wrapperlicious h3:hover .permalink, #wrapperlicious h4:hover .permalink, #wrapperlicious dt:hover .permalink { display: block; } </style> <!-- Global site tag (gtag.js) - Google Analytics --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-KVNWBNT5FB"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-KVNWBNT5FB'); gtag('config', 'UA-50555-3'); </script> </head> <body> <nav class="navbar navbar-expand-md navbar-dark bg-dark justify-content-between"> <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <a class="navbar-brand" href="/">Perldoc Browser</a> <div class="collapse navbar-collapse" id="navbarNav"> <ul class="navbar-nav mr-auto"> <li class="nav-item dropdown text-nowrap"> <a class="nav-link dropdown-toggle" href="#" id="dropdownlink-stable" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">5.18.4</a> <div class="dropdown-menu" aria-labelledby="dropdownlink-stable"> <a class="dropdown-item" href="/encoding">Latest</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.40.0/encoding">5.40.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.38.2/encoding">5.38.2</a> <a class="dropdown-item" href="/5.38.1/encoding">5.38.1</a> <a class="dropdown-item" href="/5.38.0/encoding">5.38.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.36.3/encoding">5.36.3</a> <a class="dropdown-item" href="/5.36.2/encoding">5.36.2</a> <a class="dropdown-item" href="/5.36.1/encoding">5.36.1</a> <a class="dropdown-item" href="/5.36.0/encoding">5.36.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.34.3/encoding">5.34.3</a> <a class="dropdown-item" href="/5.34.2/encoding">5.34.2</a> <a class="dropdown-item" href="/5.34.1/encoding">5.34.1</a> <a class="dropdown-item" href="/5.34.0/encoding">5.34.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.32.1/encoding">5.32.1</a> <a class="dropdown-item" href="/5.32.0/encoding">5.32.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.30.3/encoding">5.30.3</a> <a class="dropdown-item" href="/5.30.2/encoding">5.30.2</a> <a class="dropdown-item" href="/5.30.1/encoding">5.30.1</a> <a class="dropdown-item" href="/5.30.0/encoding">5.30.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.28.3/encoding">5.28.3</a> <a class="dropdown-item" href="/5.28.2/encoding">5.28.2</a> <a class="dropdown-item" href="/5.28.1/encoding">5.28.1</a> <a class="dropdown-item" href="/5.28.0/encoding">5.28.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.26.3/encoding">5.26.3</a> <a class="dropdown-item" href="/5.26.2/encoding">5.26.2</a> <a class="dropdown-item" href="/5.26.1/encoding">5.26.1</a> <a class="dropdown-item" href="/5.26.0/encoding">5.26.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.24.4/encoding">5.24.4</a> <a class="dropdown-item" href="/5.24.3/encoding">5.24.3</a> <a class="dropdown-item" href="/5.24.2/encoding">5.24.2</a> <a class="dropdown-item" href="/5.24.1/encoding">5.24.1</a> <a class="dropdown-item" href="/5.24.0/encoding">5.24.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.22.4/encoding">5.22.4</a> <a class="dropdown-item" href="/5.22.3/encoding">5.22.3</a> <a class="dropdown-item" href="/5.22.2/encoding">5.22.2</a> <a class="dropdown-item" href="/5.22.1/encoding">5.22.1</a> <a class="dropdown-item" href="/5.22.0/encoding">5.22.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.20.3/encoding">5.20.3</a> <a class="dropdown-item" href="/5.20.2/encoding">5.20.2</a> <a class="dropdown-item" href="/5.20.1/encoding">5.20.1</a> <a class="dropdown-item" href="/5.20.0/encoding">5.20.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item active" href="/5.18.4/encoding">5.18.4</a> <a class="dropdown-item" href="/5.18.3/encoding">5.18.3</a> <a class="dropdown-item" href="/5.18.2/encoding">5.18.2</a> <a class="dropdown-item" href="/5.18.1/encoding">5.18.1</a> <a class="dropdown-item" href="/5.18.0/encoding">5.18.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.16.3/encoding">5.16.3</a> <a class="dropdown-item" href="/5.16.2/encoding">5.16.2</a> <a class="dropdown-item" href="/5.16.1/encoding">5.16.1</a> <a class="dropdown-item" href="/5.16.0/encoding">5.16.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.14.4/encoding">5.14.4</a> <a class="dropdown-item" href="/5.14.3/encoding">5.14.3</a> <a class="dropdown-item" href="/5.14.2/encoding">5.14.2</a> <a class="dropdown-item" href="/5.14.1/encoding">5.14.1</a> <a class="dropdown-item" href="/5.14.0/encoding">5.14.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.12.5/encoding">5.12.5</a> <a class="dropdown-item" href="/5.12.4/encoding">5.12.4</a> <a class="dropdown-item" href="/5.12.3/encoding">5.12.3</a> <a class="dropdown-item" href="/5.12.2/encoding">5.12.2</a> <a class="dropdown-item" href="/5.12.1/encoding">5.12.1</a> <a class="dropdown-item" href="/5.12.0/encoding">5.12.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.10.1/encoding">5.10.1</a> <a class="dropdown-item" href="/5.10.0/encoding">5.10.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.8.9/encoding">5.8.9</a> <a class="dropdown-item" href="/5.8.8/encoding">5.8.8</a> <a class="dropdown-item" href="/5.8.7/encoding">5.8.7</a> <a class="dropdown-item" href="/5.8.6/encoding">5.8.6</a> <a class="dropdown-item" href="/5.8.5/encoding">5.8.5</a> <a class="dropdown-item" href="/5.8.4/encoding">5.8.4</a> <a class="dropdown-item" href="/5.8.3/encoding">5.8.3</a> <a class="dropdown-item" href="/5.8.2/encoding">5.8.2</a> <a class="dropdown-item" href="/5.8.1/encoding">5.8.1</a> <a class="dropdown-item" href="/5.8.0/encoding">5.8.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.6.2/encoding">5.6.2</a> <a class="dropdown-item" href="/5.6.1/encoding">5.6.1</a> <a class="dropdown-item" href="/5.6.0/encoding">5.6.0</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.005_04/encoding">5.005_04</a> <a class="dropdown-item" href="/5.005_03/encoding">5.005_03</a> <a class="dropdown-item" href="/5.005_02/encoding">5.005_02</a> <a class="dropdown-item" href="/5.005_01/encoding">5.005_01</a> <a class="dropdown-item" href="/5.005/encoding">5.005</a> </div> </li> <li class="nav-item dropdown text-nowrap"> <a class="nav-link dropdown-toggle" href="#" id="dropdownlink-dev" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Dev</a> <div class="dropdown-menu" aria-labelledby="dropdownlink-dev"> <a class="dropdown-item" href="/blead/encoding">blead</a> <a class="dropdown-item" href="/5.41.6/encoding">5.41.6</a> <a class="dropdown-item" href="/5.41.5/encoding">5.41.5</a> <a class="dropdown-item" href="/5.41.4/encoding">5.41.4</a> <a class="dropdown-item" href="/5.41.3/encoding">5.41.3</a> <a class="dropdown-item" href="/5.41.2/encoding">5.41.2</a> <a class="dropdown-item" href="/5.41.1/encoding">5.41.1</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.40.0-RC2/encoding">5.40.0-RC2</a> <a class="dropdown-item" href="/5.40.0-RC1/encoding">5.40.0-RC1</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.39.10/encoding">5.39.10</a> <a class="dropdown-item" href="/5.39.9/encoding">5.39.9</a> <a class="dropdown-item" href="/5.39.8/encoding">5.39.8</a> <a class="dropdown-item" href="/5.39.7/encoding">5.39.7</a> <a class="dropdown-item" href="/5.39.6/encoding">5.39.6</a> <a class="dropdown-item" href="/5.39.5/encoding">5.39.5</a> <a class="dropdown-item" href="/5.39.4/encoding">5.39.4</a> <a class="dropdown-item" href="/5.39.3/encoding">5.39.3</a> <a class="dropdown-item" href="/5.39.2/encoding">5.39.2</a> <a class="dropdown-item" href="/5.39.1/encoding">5.39.1</a> </div> </li> <li class="nav-item dropdown text-nowrap"> <a class="nav-link dropdown-toggle" href="#" id="dropdownlink-nav" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Documentation</a> <div class="dropdown-menu" aria-labelledby="dropdownlink-nav"> <a class="dropdown-item" href="/5.18.4/perl">Perl</a> <a class="dropdown-item" href="/5.18.4/perlintro">Intro</a> <a class="dropdown-item" href="/5.18.4/perl#Tutorials">Tutorials</a> <a class="dropdown-item" href="/5.18.4/perlfaq">FAQs</a> <a class="dropdown-item" href="/5.18.4/perl#Reference-Manual">Reference</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.18.4/perlop">Operators</a> <a class="dropdown-item" href="/5.18.4/functions">Functions</a> <a class="dropdown-item" href="/5.18.4/variables">Variables</a> <a class="dropdown-item" href="/5.18.4/modules">Modules</a> <a class="dropdown-item" href="/5.18.4/perlutil">Utilities</a> <div class="dropdown-divider"></div> <a class="dropdown-item" href="/5.18.4/perldelta">Release Notes</a> <a class="dropdown-item" href="/5.18.4/perlcommunity">Community</a> <a class="dropdown-item" href="/5.18.4/perlhist">History</a> </div> </li> </ul> <ul class="navbar-nav"> <script> function set_expand (expand) { var perldocdiv = document.getElementById('perldocdiv'); var width = window.getComputedStyle(perldocdiv).getPropertyValue('max-width'); var expanded = (width == '' || width == 'none') ? true : false; if (expand === null) { expand = !expanded; } if ((expand && !expanded) || (!expand && expanded)) { perldocdiv.style.setProperty('max-width', expand ? 'none' : '960px'); var button_classlist = document.getElementById('content-expand-button').classList; if (expand) { button_classlist.add('btn-light'); button_classlist.remove('btn-outline-light'); } else { button_classlist.add('btn-outline-light'); button_classlist.remove('btn-light'); } } return expand; } function toggle_expand () { var expand = set_expand(null); document.cookie = 'perldoc_expand=' + (expand ? 1 : 0) + '; path=/; expires=Tue, 19 Jan 2038 03:14:07 UTC'; } function read_expand () { return document.cookie.split(';').some(function (item) { return item.indexOf('perldoc_expand=1') >= 0 }); } if (document.readyState === 'loading') { document.addEventListener('DOMContentLoaded', function () { if (read_expand()) { set_expand(true); } }); } else if (read_expand()) { set_expand(true); } </script> <button id="content-expand-button" type="button" class="btn btn-outline-light d-none d-lg-inline-block mr-4" onclick="toggle_expand()">Expand</button> </ul> <form class="form-inline" method="get" action="/5.18.4/search"> <input class="form-control mr-3" type="search" name="q" placeholder="Search" aria-label="Search" value=""> </form> </div> </nav> <div id="wrapperlicious" class="container-fluid"> <div id="perldocdiv"> <div id="links"> <a href="/5.18.4/encoding">encoding</a> <div id="more"> (<a href="/5.18.4/encoding.txt">source</a>, <a href="https://metacpan.org/pod/encoding">CPAN</a>) </div> <div id="moduleversion">version 2.6_01</div> </div> <div class="leading-notice"> You are viewing the version of this documentation from Perl 5.18.4. <a href="/encoding">View the latest version</a> </div> <h1><a id="toc">CONTENTS</a></h1> <ul> <li> <a class="text-decoration-none" href="#NAME">NAME</a> </li> <li> <a class="text-decoration-none" href="#WARNING">WARNING</a> </li> <li> <a class="text-decoration-none" href="#SYNOPSIS">SYNOPSIS</a> </li> <li> <a class="text-decoration-none" href="#ABSTRACT">ABSTRACT</a> <ul> <li> <a class="text-decoration-none" href="#Literal-Conversions">Literal Conversions</a> </li> <li> <a class="text-decoration-none" href="#PerlIO-layers-for-STD(IN%7COUT)">PerlIO layers for STD(IN|OUT)</a> </li> <li> <a class="text-decoration-none" href="#Implicit-upgrading-for-byte-strings">Implicit upgrading for byte strings</a> </li> <li> <a class="text-decoration-none" href="#Side-effects">Side effects</a> </li> </ul> </li> <li> <a class="text-decoration-none" href="#FEATURES-THAT-REQUIRE-5.8.1">FEATURES THAT REQUIRE 5.8.1</a> </li> <li> <a class="text-decoration-none" href="#USAGE">USAGE</a> </li> <li> <a class="text-decoration-none" href="#The-Filter-Option">The Filter Option</a> <ul> <li> <a class="text-decoration-none" href="#Filter-related-changes-at-Encode-version-1.87">Filter-related changes at Encode version 1.87</a> </li> </ul> </li> <li> <a class="text-decoration-none" href="#CAVEATS">CAVEATS</a> <ul> <li> <a class="text-decoration-none" href="#NOT-SCOPED">NOT SCOPED</a> </li> <li> <a class="text-decoration-none" href="#DO-NOT-MIX-MULTIPLE-ENCODINGS">DO NOT MIX MULTIPLE ENCODINGS</a> </li> <li> <a class="text-decoration-none" href="#tr///-with-ranges">tr/// with ranges</a> <ul> <li> <a class="text-decoration-none" href="#workaround-to-tr///;">workaround to tr///;</a> </li> </ul> </li> </ul> </li> <li> <a class="text-decoration-none" href="#EXAMPLE-Greekperl">EXAMPLE - Greekperl</a> </li> <li> <a class="text-decoration-none" href="#KNOWN-PROBLEMS">KNOWN PROBLEMS</a> <ul> <li> <a class="text-decoration-none" href="#The-Logic-of-:locale">The Logic of :locale</a> </li> </ul> </li> <li> <a class="text-decoration-none" href="#HISTORY">HISTORY</a> </li> <li> <a class="text-decoration-none" href="#SEE-ALSO">SEE ALSO</a> </li> </ul> <h1 id="NAME"><a class="permalink" href="#NAME">#</a>NAME</h1> <p>encoding - allows you to write your script in non-ascii or non-utf8</p> <h1 id="WARNING"><a class="permalink" href="#WARNING">#</a>WARNING</h1> <p>This module is deprecated under perl 5.18. It uses a mechanism provided by perl that is deprecated under 5.18 and higher, and may be removed in a future version.</p> <h1 id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">#</a>SYNOPSIS</h1> <pre><code>use encoding "greek"; # Perl like Greek to you? use encoding "euc-jp"; # Jperl! # or you can even do this if your shell supports your native encoding perl -Mencoding=latin2 -e'...' # Feeling centrally European? perl -Mencoding=euc-kr -e'...' # Or Korean? # more control # A simple euc-cn => utf-8 converter use encoding "euc-cn", STDOUT => "utf8"; while(<>){print}; # "no encoding;" supported (but not scoped!) no encoding; # an alternate way, Filter use encoding "euc-jp", Filter=>1; # now you can use kanji identifiers -- in euc-jp! # switch on locale - # note that this probably means that unless you have a complete control # over the environments the application is ever going to be run, you should # NOT use the feature of encoding pragma allowing you to write your script # in any recognized encoding because changing locale settings will wreck # the script; you can of course still use the other features of the pragma. use encoding ':locale';</code></pre> <h1 id="ABSTRACT"><a class="permalink" href="#ABSTRACT">#</a>ABSTRACT</h1> <p>Let's start with a bit of history: Perl 5.6.0 introduced Unicode support. You could apply <code>substr()</code> and regexes even to complex CJK characters -- so long as the script was written in UTF-8. But back then, text editors that supported UTF-8 were still rare and many users instead chose to write scripts in legacy encodings, giving up a whole new feature of Perl 5.6.</p> <p>Rewind to the future: starting from perl 5.8.0 with the <b>encoding</b> pragma, you can write your script in any encoding you like (so long as the <code>Encode</code> module supports it) and still enjoy Unicode support. This pragma achieves that by doing the following:</p> <ul> <li><p>Internally converts all literals (<code>q//,qq//,qr//,qw///, qx//</code>) from the encoding specified to utf8. In Perl 5.8.1 and later, literals in <code>tr///</code> and <code>DATA</code> pseudo-filehandle are also converted.</p> </li> <li><p>Changing PerlIO layers of <code>STDIN</code> and <code>STDOUT</code> to the encoding specified.</p> </li> </ul> <h2 id="Literal-Conversions"><a class="permalink" href="#Literal-Conversions">#</a><a id="Literal"></a>Literal Conversions</h2> <p>You can write code in EUC-JP as follows:</p> <pre><code>my $Rakuda = "\xF1\xD1\xF1\xCC"; # Camel in Kanji #<-char-><-char-> # 4 octets s/\bCamel\b/$Rakuda/;</code></pre> <p>And with <code>use encoding "euc-jp"</code> in effect, it is the same thing as the code in UTF-8:</p> <pre><code>my $Rakuda = "\x{99F1}\x{99DD}"; # two Unicode Characters s/\bCamel\b/$Rakuda/;</code></pre> <h2 id="PerlIO-layers-for-STD(IN|OUT)"><a class="permalink" href="#PerlIO-layers-for-STD(IN%7COUT)">#</a><a id="PerlIO"></a><a id="PerlIO-layers-for-STD-IN-OUT"></a>PerlIO layers for <code>STD(IN|OUT)</code></h2> <p>The <b>encoding</b> pragma also modifies the filehandle layers of STDIN and STDOUT to the specified encoding. Therefore,</p> <pre><code>use encoding "euc-jp"; my $message = "Camel is the symbol of perl.\n"; my $Rakuda = "\xF1\xD1\xF1\xCC"; # Camel in Kanji $message =~ s/\bCamel\b/$Rakuda/; print $message;</code></pre> <p>Will print "\xF1\xD1\xF1\xCC is the symbol of perl.\n", not "\x{99F1}\x{99DD} is the symbol of perl.\n".</p> <p>You can override this by giving extra arguments; see below.</p> <h2 id="Implicit-upgrading-for-byte-strings"><a class="permalink" href="#Implicit-upgrading-for-byte-strings">#</a><a id="Implicit"></a>Implicit upgrading for byte strings</h2> <p>By default, if strings operating under byte semantics and strings with Unicode character data are concatenated, the new string will be created by decoding the byte strings as <i>ISO 8859-1 (Latin-1)</i>.</p> <p>The <b>encoding</b> pragma changes this to use the specified encoding instead. For example:</p> <pre><code>use encoding 'utf8'; my $string = chr(20000); # a Unicode string utf8::encode($string); # now it's a UTF-8 encoded byte string # concatenate with another Unicode string print length($string . chr(20000));</code></pre> <p>Will print <code>2</code>, because <code>$string</code> is upgraded as UTF-8. Without <code>use encoding 'utf8';</code>, it will print <code>4</code> instead, since <code>$string</code> is three octets when interpreted as Latin-1.</p> <h2 id="Side-effects"><a class="permalink" href="#Side-effects">#</a><a id="Side"></a>Side effects</h2> <p>If the <code>encoding</code> pragma is in scope then the lengths returned are calculated from the length of <code>$/</code> in Unicode characters, which is not always the same as the length of <code>$/</code> in the native encoding.</p> <p>This pragma affects utf8::upgrade, but not utf8::downgrade.</p> <h1 id="FEATURES-THAT-REQUIRE-5.8.1"><a class="permalink" href="#FEATURES-THAT-REQUIRE-5.8.1">#</a><a id="FEATURES"></a>FEATURES THAT REQUIRE 5.8.1</h1> <p>Some of the features offered by this pragma requires perl 5.8.1. Most of these are done by Inaba Hiroto. Any other features and changes are good for 5.8.0.</p> <dl> <dt id=""NON-EUC"-doublebyte-encodings"><a class="permalink" href="#%22NON-EUC%22-doublebyte-encodings">#</a><a id="NON-EUC-doublebyte-encodings"></a>"NON-EUC" doublebyte encodings</dt> <dd> <p>Because perl needs to parse script before applying this pragma, such encodings as Shift_JIS and Big-5 that may contain '\' (BACKSLASH; \x5c) in the second byte fails because the second byte may accidentally escape the quoting character that follows. Perl 5.8.1 or later fixes this problem.</p> </dd> <dt id="tr//"><a class="permalink" href="#tr//">#</a><a id="tr"></a>tr//</dt> <dd> <p><code>tr//</code> was overlooked by Perl 5 porters when they released perl 5.8.0 See the section below for details.</p> </dd> <dt id="DATA-pseudo-filehandle"><a class="permalink" href="#DATA-pseudo-filehandle">#</a><a id="DATA"></a>DATA pseudo-filehandle</dt> <dd> <p>Another feature that was overlooked was <code>DATA</code>.</p> </dd> </dl> <h1 id="USAGE"><a class="permalink" href="#USAGE">#</a>USAGE</h1> <dl> <dt id="use-encoding-[ENCNAME]-;"><a class="permalink" href="#use-encoding-%5BENCNAME%5D-;">#</a><a id="use"></a><a id="use-encoding-ENCNAME"></a>use encoding [<i>ENCNAME</i>] ;</dt> <dd> <p>Sets the script encoding to <i>ENCNAME</i>. And unless ${^UNICODE} exists and non-zero, PerlIO layers of STDIN and STDOUT are set to ":encoding(<i>ENCNAME</i>)".</p> <p>Note that STDERR WILL NOT be changed.</p> <p>Also note that non-STD file handles remain unaffected. Use <code>use open</code> or <code>binmode</code> to change layers of those.</p> <p>If no encoding is specified, the environment variable <a href="/5.18.4/PERL_ENCODING">PERL_ENCODING</a> is consulted. If no encoding can be found, the error <code>Unknown encoding '<i>ENCNAME</i>'</code> will be thrown.</p> </dd> <dt id="use-encoding-ENCNAME-[-STDIN-=>-ENCNAME_IN-...]-;"><a class="permalink" href="#use-encoding-ENCNAME-%5B-STDIN-=%3E-ENCNAME_IN-...%5D-;">#</a><a id="use1"></a><a id="use-encoding-ENCNAME-STDIN-ENCNAME_IN"></a>use encoding <i>ENCNAME</i> [ STDIN => <i>ENCNAME_IN</i> ...] ;</dt> <dd> <p>You can also individually set encodings of STDIN and STDOUT via the <code>STDIN => <i>ENCNAME</i></code> form. In this case, you cannot omit the first <i>ENCNAME</i>. <code>STDIN => undef</code> turns the IO transcoding completely off.</p> <p>When ${^UNICODE} exists and non-zero, these options will completely ignored. ${^UNICODE} is a variable introduced in perl 5.8.1. See <a href="/5.18.4/perlrun">perlrun</a> see <a href="/5.18.4/perlvar#%24%7B%5EUNICODE%7D">"${^UNICODE}" in perlvar</a> and <a href="/5.18.4/perlrun#-C">"-C" in perlrun</a> for details (perl 5.8.1 and later).</p> </dd> <dt id="use-encoding-ENCNAME-Filter=>1;"><a class="permalink" href="#use-encoding-ENCNAME-Filter=%3E1;">#</a><a id="use2"></a><a id="use-encoding-ENCNAME-Filter-1"></a>use encoding <i>ENCNAME</i> Filter=>1;</dt> <dd> <p>This turns the encoding pragma into a source filter. While the default approach just decodes interpolated literals (in qq() and qr()), this will apply a source filter to the entire source code. See <a href="#The-Filter-Option">"The Filter Option"</a> below for details.</p> </dd> <dt id="no-encoding;"><a class="permalink" href="#no-encoding;">#</a><a id="no"></a><a id="no-encoding"></a>no encoding;</dt> <dd> <p>Unsets the script encoding. The layers of STDIN, STDOUT are reset to ":raw" (the default unprocessed raw stream of bytes).</p> </dd> </dl> <h1 id="The-Filter-Option"><a class="permalink" href="#The-Filter-Option">#</a><a id="The"></a>The Filter Option</h1> <p>The magic of <code>use encoding</code> is not applied to the names of identifiers. In order to make <code>${"\x{4eba}"}++</code> ($human++, where human is a single Han ideograph) work, you still need to write your script in UTF-8 -- or use a source filter. That's what 'Filter=>1' does.</p> <p>What does this mean? Your source code behaves as if it is written in UTF-8 with 'use utf8' in effect. So even if your editor only supports Shift_JIS, for example, you can still try examples in Chapter 15 of <code>Programming Perl, 3rd Ed.</code>. For instance, you can use UTF-8 identifiers.</p> <p>This option is significantly slower and (as of this writing) non-ASCII identifiers are not very stable WITHOUT this option and with the source code written in UTF-8.</p> <h2 id="Filter-related-changes-at-Encode-version-1.87"><a class="permalink" href="#Filter-related-changes-at-Encode-version-1.87">#</a><a id="Filter"></a>Filter-related changes at Encode version 1.87</h2> <ul> <li><p>The Filter option now sets STDIN and STDOUT like non-filter options. And <code>STDIN=><i>ENCODING</i></code> and <code>STDOUT=><i>ENCODING</i></code> work like non-filter version.</p> </li> <li><p><code>use utf8</code> is implicitly declared so you no longer have to <code>use utf8</code> to <code>${"\x{4eba}"}++</code>.</p> </li> </ul> <h1 id="CAVEATS"><a class="permalink" href="#CAVEATS">#</a>CAVEATS</h1> <h2 id="NOT-SCOPED"><a class="permalink" href="#NOT-SCOPED">#</a><a id="NOT"></a>NOT SCOPED</h2> <p>The pragma is a per script, not a per block lexical. Only the last <code>use encoding</code> or <code>no encoding</code> matters, and it affects <b>the whole script</b>. However, the <no encoding> pragma is supported and <b>use encoding</b> can appear as many times as you want in a given script. The multiple use of this pragma is discouraged.</p> <p>By the same reason, the use this pragma inside modules is also discouraged (though not as strongly discouraged as the case above. See below).</p> <p>If you still have to write a module with this pragma, be very careful of the load order. See the codes below;</p> <pre><code># called module package Module_IN_BAR; use encoding "bar"; # stuff in "bar" encoding here 1; # caller script use encoding "foo" use Module_IN_BAR; # surprise! use encoding "bar" is in effect.</code></pre> <p>The best way to avoid this oddity is to use this pragma RIGHT AFTER other modules are loaded. i.e.</p> <pre><code>use Module_IN_BAR; use encoding "foo";</code></pre> <h2 id="DO-NOT-MIX-MULTIPLE-ENCODINGS"><a class="permalink" href="#DO-NOT-MIX-MULTIPLE-ENCODINGS">#</a><a id="DO"></a>DO NOT MIX MULTIPLE ENCODINGS</h2> <p>Notice that only literals (string or regular expression) having only legacy code points are affected: if you mix data like this</p> <pre><code class="plaintext">\xDF\x{100}</code></pre> <p>the data is assumed to be in (Latin 1 and) Unicode, not in your native encoding. In other words, this will match in "greek":</p> <pre><code class="plaintext">"\xDF" =~ /\x{3af}/</code></pre> <p>but this will not</p> <pre><code class="plaintext">"\xDF\x{100}" =~ /\x{3af}\x{100}/</code></pre> <p>since the <code>\xDF</code> (ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on the left will <b>not</b> be upgraded to <code>\x{3af}</code> (Unicode GREEK SMALL LETTER IOTA WITH TONOS) because of the <code>\x{100}</code> on the left. You should not be mixing your legacy data and Unicode in the same string.</p> <p>This pragma also affects encoding of the 0x80..0xFF code point range: normally characters in that range are left as eight-bit bytes (unless they are combined with characters with code points 0x100 or larger, in which case all characters need to become UTF-8 encoded), but if the <code>encoding</code> pragma is present, even the 0x80..0xFF range always gets UTF-8 encoded.</p> <p>After all, the best thing about this pragma is that you don't have to resort to \x{....} just to spell your name in a native encoding. So feel free to put your strings in your encoding in quotes and regexes.</p> <h2 id="tr///-with-ranges"><a class="permalink" href="#tr///-with-ranges">#</a><a id="tr1"></a><a id="tr-with-ranges"></a>tr/// with ranges</h2> <p>The <b>encoding</b> pragma works by decoding string literals in <code>q//,qq//,qr//,qw///, qx//</code> and so forth. In perl 5.8.0, this does not apply to <code>tr///</code>. Therefore,</p> <pre><code>use encoding 'euc-jp'; #.... $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/; # -------- -------- -------- --------</code></pre> <p>Does not work as</p> <pre><code>$kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/;</code></pre> <dl> <dt id="Legend-of-characters-above"><a class="permalink" href="#Legend-of-characters-above">#</a><a id="Legend"></a>Legend of characters above</dt> <dd> <pre><code class="plaintext">utf8 euc-jp charnames::viacode() ----------------------------------------- \x{3041} \xA4\xA1 HIRAGANA LETTER SMALL A \x{3093} \xA4\xF3 HIRAGANA LETTER N \x{30a1} \xA5\xA1 KATAKANA LETTER SMALL A \x{30f3} \xA5\xF3 KATAKANA LETTER N</code></pre> </dd> </dl> <p>This counterintuitive behavior has been fixed in perl 5.8.1.</p> <h3 id="workaround-to-tr///;"><a class="permalink" href="#workaround-to-tr///;">#</a><a id="workaround"></a><a id="workaround-to-tr"></a>workaround to tr///;</h3> <p>In perl 5.8.0, you can work around as follows;</p> <pre><code>use encoding 'euc-jp'; # .... eval qq{ \$kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ };</code></pre> <p>Note the <code>tr//</code> expression is surrounded by <code>qq{}</code>. The idea behind is the same as classic idiom that makes <code>tr///</code> 'interpolate'.</p> <pre><code>tr/$from/$to/; # wrong! eval qq{ tr/$from/$to/ }; # workaround.</code></pre> <p>Nevertheless, in case of <b>encoding</b> pragma even <code>q//</code> is affected so <code>tr///</code> not being decoded was obviously against the will of Perl5 Porters so it has been fixed in Perl 5.8.1 or later.</p> <h1 id="EXAMPLE-Greekperl"><a class="permalink" href="#EXAMPLE-Greekperl">#</a><a id="EXAMPLE"></a><a id="EXAMPLE---Greekperl"></a>EXAMPLE - Greekperl</h1> <pre><code>use encoding "iso 8859-7"; # \xDF in ISO 8859-7 (Greek) is \x{3af} in Unicode. $a = "\xDF"; $b = "\x{100}"; printf "%#x\n", ord($a); # will print 0x3af, not 0xdf $c = $a . $b; # $c will be "\x{3af}\x{100}", not "\x{df}\x{100}". # chr() is affected, and ... print "mega\n" if ord(chr(0xdf)) == 0x3af; # ... ord() is affected by the encoding pragma ... print "tera\n" if ord(pack("C", 0xdf)) == 0x3af; # ... as are eq and cmp ... print "peta\n" if "\x{3af}" eq pack("C", 0xdf); print "exa\n" if "\x{3af}" cmp pack("C", 0xdf) == 0; # ... but pack/unpack C are not affected, in case you still # want to go back to your native encoding print "zetta\n" if unpack("C", (pack("C", 0xdf))) == 0xdf;</code></pre> <h1 id="KNOWN-PROBLEMS"><a class="permalink" href="#KNOWN-PROBLEMS">#</a><a id="KNOWN"></a>KNOWN PROBLEMS</h1> <dl> <dt id="literals-in-regex-that-are-longer-than-127-bytes"><a class="permalink" href="#literals-in-regex-that-are-longer-than-127-bytes">#</a><a id="literals"></a>literals in regex that are longer than 127 bytes</dt> <dd> <p>For native multibyte encodings (either fixed or variable length), the current implementation of the regular expressions may introduce recoding errors for regular expression literals longer than 127 bytes.</p> </dd> <dt id="EBCDIC"><a class="permalink" href="#EBCDIC">#</a>EBCDIC</dt> <dd> <p>The encoding pragma is not supported on EBCDIC platforms. (Porters who are willing and able to remove this limitation are welcome.)</p> </dd> <dt id="format"><a class="permalink" href="#format">#</a>format</dt> <dd> <p>This pragma doesn't work well with format because PerlIO does not get along very well with it. When format contains non-ascii characters it prints funny or gets "wide character warnings". To understand it, try the code below.</p> <pre><code># Save this one in utf8 # replace *non-ascii* with a non-ascii string my $camel; format STDOUT = *non-ascii*@>>>>>>> $camel . $camel = "*non-ascii*"; binmode(STDOUT=>':encoding(utf8)'); # bang! write; # funny print $camel, "\n"; # fine</code></pre> <p>Without binmode this happens to work but without binmode, print() fails instead of write().</p> <p>At any rate, the very use of format is questionable when it comes to unicode characters since you have to consider such things as character width (i.e. double-width for ideographs) and directions (i.e. BIDI for Arabic and Hebrew).</p> </dd> <dt id="Thread-safety"><a class="permalink" href="#Thread-safety">#</a><a id="Thread"></a>Thread safety</dt> <dd> <p><code>use encoding ...</code> is not thread-safe (i.e., do not use in threaded applications).</p> </dd> </dl> <h2 id="The-Logic-of-:locale"><a class="permalink" href="#The-Logic-of-:locale">#</a><a id="The1"></a>The Logic of :locale</h2> <p>The logic of <code>:locale</code> is as follows:</p> <ol> <li><p>If the platform supports the langinfo(CODESET) interface, the codeset returned is used as the default encoding for the open pragma.</p> </li> <li><p>If 1. didn't work but we are under the locale pragma, the environment variables LC_ALL and LANG (in that order) are matched for encodings (the part after <code>.</code>, if any), and if any found, that is used as the default encoding for the open pragma.</p> </li> <li><p>If 1. and 2. didn't work, the environment variables LC_ALL and LANG (in that order) are matched for anything looking like UTF-8, and if any found, <code>:utf8</code> is used as the default encoding for the open pragma.</p> </li> </ol> <p>If your locale environment variables (LC_ALL, LC_CTYPE, LANG) contain the strings 'UTF-8' or 'UTF8' (case-insensitive matching), the default encoding of your STDIN, STDOUT, and STDERR, and of <b>any subsequent file open</b>, is UTF-8.</p> <h1 id="HISTORY"><a class="permalink" href="#HISTORY">#</a>HISTORY</h1> <p>This pragma first appeared in Perl 5.8.0. For features that require 5.8.1 and better, see above.</p> <p>The <code>:locale</code> subpragma was implemented in 2.01, or Perl 5.8.6.</p> <h1 id="SEE-ALSO"><a class="permalink" href="#SEE-ALSO">#</a><a id="SEE"></a>SEE ALSO</h1> <p><a href="/5.18.4/perlunicode">perlunicode</a>, <a href="/5.18.4/Encode">Encode</a>, <a href="/5.18.4/open">open</a>, <a href="/5.18.4/Filter::Util::Call">Filter::Util::Call</a>,</p> <p>Ch. 15 of <code>Programming Perl (3rd Edition)</code> by Larry Wall, Tom Christiansen, Jon Orwant; O'Reilly & Associates; ISBN 0-596-00027-8</p> </div> <div id="footer"> <p>Perldoc Browser is maintained by Dan Book (<a href="https://metacpan.org/author/DBOOK">DBOOK</a>). Please contact him via the <a href="https://github.com/Grinnz/perldoc-browser/issues">GitHub issue tracker</a> or <a href="mailto:dbook@cpan.org">email</a> regarding any issues with the site itself, search, or rendering of documentation.</p> <p>The Perl documentation is maintained by the Perl 5 Porters in the development of Perl. Please contact them via the <a href="https://github.com/Perl/perl5/issues">Perl issue tracker</a>, the <a href="https://lists.perl.org/list/perl5-porters.html">mailing list</a>, or <a href="https://kiwiirc.com/client/irc.perl.org/p5p">IRC</a> to report any issues with the contents or format of the documentation.</p> </div> </div> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.slim.min.js" integrity="sha512-/DXTXr6nQodMUiq+IUJYCt2PPOUjrHJ9wFrqpJ3XkgPNOZVfMok7cRw6CSxyCQxXn6ozlESsSh1/sMCTF1rL/g==" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.16.1/umd/popper.min.js" integrity="sha512-ubuT8Z88WxezgSqf3RLuNi5lmjstiJcyezx34yIU2gAHonIi27Na7atqzUZCOoY4CExaoFumzOsFQ2Ch+I/HCw==" crossorigin="anonymous"></script> <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js" integrity="sha384-B4gt1jrGC7Jh4AgTPSdUtOBvfO8shuf57BaghqFfPlYxofvL8/KUEfYiJOMMV+rV" crossorigin="anonymous"></script> <script src="/highlight.pack.js"></script> <script>hljs.highlightAll();</script> </body> </html>