CINXE.COM

Launchpad Blog

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="description" content="Blog posts from the Launchpad team" /> <title>Launchpad Blog</title> <link href="https://blog.launchpad.net/wp-content/themes/launchpad/style.css" rel="stylesheet" type="text/css" /> <link rel="shortcut icon" href="https://launchpad.net/@@/launchpad" /> <script type="text/javascript" src="https://blog.launchpad.net/wp-content/themes/launchpad/js/mootools-1.2-core.js"></script> <script type="text/javascript" src="https://blog.launchpad.net/wp-content/themes/launchpad/js/funcs.js"></script> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-12833497-3']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> </head> <body> <!-- Header --> <div id="header"> <a href="/wp-admin" style="float:right; top: 2px;">Log in</a> <div id="finder"> <input type="search" accesskey="s" value="Search blog archives" name="s" id="s" /> <input type="hidden" name="blog_url" id="blog_url" value="https://blog.launchpad.net" /> <a href="https://blog.launchpad.net/feed" title="RSS Feed for Blog Entries"><img src="https://blog.launchpad.net/wp-content/themes/launchpad/images/rss.png" alt="RSS Feed" /></a> </div> <h1><a href="https://blog.launchpad.net" class="header-link"><img src="https://blog.launchpad.net/wp-content/themes/launchpad/images/logo.png" /><span class="logotext"> launchpad</span><strong>blog</strong></a></h1> </div> <div id="content" class="widecolumn"> <div class="navigation"> &laquo; <a href="https://blog.launchpad.net/notifications/bug-emails-now-use-the-bugs-address-in-the-from-header" rel="prev">Bug emails now use the bug&#8217;s address in the From: header</a> &nbsp;&nbsp;&nbsp; <a href="https://blog.launchpad.net/code/git-protocol-v2-available-at-launchpad" rel="next">Git Protocol v2 Available at Launchpad</a> &raquo; </div> <div class="post" id="post-4339"> <h2> <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names" rel="bookmark" title="Permanent Link: Login regression for users with non-ASCII names">Login regression for users with non-ASCII names</a> </h2> <div class="entry"> <p>On 2020-08-13, we deployed an update that caused users whose full names contain non-ASCII characters (which is of course <a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/">very common</a>) to be unable to log into Launchpad. We heard about this serious regression from users on 2020-08-17, and rolled out a fix on 2020-08-18. We&#8217;re sorry about this; it doesn&#8217;t meet the standards of both inclusion and quality that we set for ourselves. This post aims to explain what happened, technical details of why it happened, and the steps we&#8217;ve taken to avoid it happening again.</p> <span id="more-4339"></span> <p>Launchpad still runs on Python 2. This is <a href="https://www.python.org/doc/sunset-python-2/">a problem</a>, and we&#8217;ve been gradually chipping away at it for the last couple of years. With about three-quarters of a million lines of Python code in the main tree and over 200 dependencies, it&#8217;s a big job &#8211; but we&#8217;re well underway!</p> <p>Some of those dependencies have been difficult problems in their own right. The one at issue here was <a href="https://pypi.org/project/python-openid/">python-openid</a>, which we use as part of our login workflow, but which hasn&#8217;t been actively maintained for over ten years. Fortunately, in this case we didn&#8217;t have to port it ourselves, because there were already a couple of forks featuring Python 3 support while preserving more or less the same interface: we chose <a href="https://pypi.org/project/python-openid2/">python-openid2</a> on the grounds that it had done a good job of maintaining both Python 2 and 3 support in the same codebase, which we needed in order to arrange a practical transition, and that it was in itself well-maintained. We worked with upstream to fix a couple of issues discovered by the Launchpad test suite that blocked us migrating to it (notably <a href="https://github.com/ziima/python-openid/pull/41">PR #41</a>, although that was fixed as <a href="https://github.com/ziima/python-openid/pull/43">PR #43</a> instead), and <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/387907">switched Launchpad over</a> once python-openid2 3.2 was released. So far, so good.</p> <p>One of the major reasons for much of the disruption in the Python 3 transition was to provide a clean separation between the concept of a sequence of bytes and a text string, which was often a problem for code that needed to handle Unicode: it&#8217;s all too common in Python 2 to have code that works on the ASCII domain (which can be represented either as <code>str</code> or <code>unicode</code>) but that fails on Unicode strings outside that subset. Launchpad is less prone to that than many Python 2 applications because the <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a> we use (<a href="https://pypi.org/project/storm/">Storm</a>) has always been relatively strict about the boundary between bytes and text; nevertheless, having a stricter data model here is a good thing for us in the long term. It might seem ironic that we ran into exactly such a bug as part of porting to Python 3; but then, we aren&#8217;t using the new interpreter yet.</p> <p>Launchpad uses the <a href="https://openid.net/specs/openid-simple-registration-extension-1_0.html">OpenID Simple Registration Extension</a> in its login workflow. It specifically requests the user&#8217;s full name from Canonical&#8217;s OpenID provider (<a href="https://login.ubuntu.com/">login.ubuntu.com</a>, which we generally call &#8220;SSO&#8221;): this means that if the user has an SSO account but not yet a Launchpad account, we can create a Launchpad account for them without them needing to enter their name again. That full name is encoded as a UTF-8 string, which in turn is URL-encoded using the usual %xx mechanism. This means that if, say, your name is <a href="https://en.wikipedia.org/wiki/Gr%C3%A1inne_N%C3%AD_Mh%C3%A1ille">Gr谩inne N铆 Mh谩ille</a>, it will show up in the OpenID response&#8217;s query string as <code>openid.sreg.fullname=Gr%C3%A1inne+N%C3%AD+Mh%C3%A1ille</code>.</p> <p>python-openid2 uses its <code>openid.urinorm</code> module to normalise parts of the response, decoding and re-encoding it to make sure comparisons work as expected; this is built on top of the URL handling code in Python&#8217;s standard library. Now, unlike Python 3, Python 2&#8217;s <a href="https://docs.python.org/2/library/urllib.html#urllib.urlencode">urlencode</a> has undocumented restrictions on values in the <code>query</code> argument: if the <code>doseq</code> argument is False (the default), then it converts values using <code>str(v)</code>, while if it&#8217;s True then it converts Unicode values using <code>v.encode("ASCII", "replace")</code> (potentially losing information!). In this case, <code>doseq</code> is False, and the input given to it is always text (<code>unicode</code> on Python 2): this works fine if the input is within the ASCII subset, but if it&#8217;s not:</p> <pre class="wp-block-code"><code>>>> urlencode({u'openid.sreg.fullname': u'Gr谩inne N铆 Mh谩ille'}) Traceback (most recent call last): File "&lt;stdin>", line 1, in &lt;module> File "/usr/lib/python2.7/urllib.py", line 1350, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 2: ordinal not in range(128)</code></pre> <p>The fix is that on Python 2 one must always pass values to <code>urlencode</code> as bytes rather than text:</p> <pre class="wp-block-code"><code>>>> urlencode({u'openid.sreg.fullname': u'Gr谩inne N铆 Mh谩ille'.encode('UTF-8')}) 'openid.sreg.fullname=Gr%C3%A1inne+N%C3%AD+Mh%C3%A1ille'</code></pre> <p>We&#8217;ve sent <a href="https://github.com/ziima/python-openid/pull/47">PR #47</a> to python-openid2 to implement this. We&#8217;ve also made a temporary local fork of python-openid2 containing this patch and deployed it to Launchpad production.</p> <p>One thing to be clear about here: though the root cause was a bug in python-openid2, it&#8217;s <em>our</em> responsibility to make sure it works correctly when integrated into Launchpad.</p> <p>We missed this bug because of a gap in testing: although we did test the full login workflow, we only did so with a test user whose full name was entirely ASCII. We&#8217;ve <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/389437">closed this gap</a> now, so we&#8217;ll catch it if a dependency regresses in future.</p> <p>Tags: <a href="https://blog.launchpad.net/tag/bug-fixes-2" rel="tag">bug-fixes</a></p> <p class="postmetadata alt"> <small>This entry was posted by <strong>Colin Watson</strong> on Thursday, August 20th, 2020 at 10:01 am and is filed under <a href="https://blog.launchpad.net/category/general" rel="category tag">General</a>. You can follow any responses to this entry through the <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names/feed">RSS 2.0</a> feed. You can <a href="#respond">leave a response</a>, or <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names/trackback" rel="trackback">trackback</a> from your own site. </small> </p> </div> </div> <!-- You can start editing here. --> <h3 id="comments">One Response to &#8220;Login regression for users with non-ASCII names&#8221;</h3> <ol class="commentlist"> <li class="alt" id="comment-708580"> <img alt='' src='https://secure.gravatar.com/avatar/?s=32&#038;d=blank&#038;r=g' srcset='https://secure.gravatar.com/avatar/?s=64&#038;d=blank&#038;r=g 2x' class='avatar avatar-32 photo avatar-default' height='32' width='32' loading='lazy'/> <cite>Francisco Jim茅nez Cabrera</cite> Says: <br /> <small class="commentmetadata"><a href="#comment-708580" title="">August 20th, 2020 at 9:56 pm</a> </small> <p>Nice job!</p> </li> </ol> <h3 id="respond">Leave a Reply</h3> <form action="https://blog.launchpad.net/wp-comments-post.php" method="post" id="commentform"> <p><input type="text" name="author" id="author" value="" size="22" tabindex="1" /> <label for="author"><small>Name </small></label></p> <p><input type="text" name="email" id="email" value="" size="22" tabindex="2" /> <label for="email"><small>Mail (will not be published) </small></label></p> <p><input type="text" name="url" id="url" value="" size="22" tabindex="3" /> <label for="url"><small>Website</small></label></p> <!--<p><small><strong>XHTML:</strong> You can use these tags: <code>&lt;a href=&quot;&quot; title=&quot;&quot;&gt; &lt;abbr title=&quot;&quot;&gt; &lt;acronym title=&quot;&quot;&gt; &lt;b&gt; &lt;blockquote cite=&quot;&quot;&gt; &lt;cite&gt; &lt;code&gt; &lt;del datetime=&quot;&quot;&gt; &lt;em&gt; &lt;i&gt; &lt;q cite=&quot;&quot;&gt; &lt;s&gt; &lt;strike&gt; &lt;strong&gt; </code></small></p>--> <p><textarea name="comment" id="comment" cols="100%" rows="10" tabindex="4"></textarea></p> <p><input name="submit" type="submit" id="submit" tabindex="5" value="Submit Comment" /> <input type="hidden" name="comment_post_ID" value="4339" /> </p> <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="18da3433a5" /></p><p style="display: none !important;"><label>&#916;<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="93"/><script>document.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );</script></p> </form> </div> <div id="footer"> <p> <a href="https://help.launchpad.net/Legal">Terms of use</a> | <a href="https://launchpad.net/feedback">Help improve Launchpad</a> | <a href="https://launchpad.net/faq">FAQ</a> </p> <p><a rel="license" href="http://creativecommons.org/licenses/by/2.0/uk/"> <span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type">Launchpad Blog</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://canonical.com/" property="cc:attributionName" rel="cc:attributionURL">Canonical Ltd</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/2.0/uk/">Creative Commons Attribution 2.0 UK: England &amp; Wales License</a>. <img alt="Creative Commons License" style="border-width:0;vertical-align:middle;" src="https://i.creativecommons.org/l/by/2.0/uk/80x15.png" /></a></p> <p>&copy; 2004-2019 <a href="https://canonical.com/" target="_blank">Canonical Limited.</a></p> </div>

Pages: 1 2 3 4 5 6 7 8 9 10