CINXE.COM
Launchpad Blog
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="description" content="Blog posts from the Launchpad team" /> <title>Launchpad Blog</title> <link href="https://blog.launchpad.net/wp-content/themes/launchpad/style.css" rel="stylesheet" type="text/css" /> <link rel="shortcut icon" href="https://launchpad.net/@@/launchpad" /> <script type="text/javascript" src="https://blog.launchpad.net/wp-content/themes/launchpad/js/mootools-1.2-core.js"></script> <script type="text/javascript" src="https://blog.launchpad.net/wp-content/themes/launchpad/js/funcs.js"></script> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-12833497-3']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> </head> <body> <!-- Header --> <div id="header"> <a href="/wp-admin" style="float:right; top: 2px;">Log in</a> <div id="finder"> <input type="search" accesskey="s" value="Search blog archives" name="s" id="s" /> <input type="hidden" name="blog_url" id="blog_url" value="https://blog.launchpad.net" /> <a href="https://blog.launchpad.net/feed" title="RSS Feed for Blog Entries"><img src="https://blog.launchpad.net/wp-content/themes/launchpad/images/rss.png" alt="RSS Feed" /></a> </div> <h1><a href="https://blog.launchpad.net" class="header-link"><img src="https://blog.launchpad.net/wp-content/themes/launchpad/images/logo.png" /><span class="logotext"> launchpad</span><strong>blog</strong></a></h1> </div> <div id="content" class="widecolumn"> <div class="navigation"> « <a href="https://blog.launchpad.net/notifications/bug-emails-now-use-the-bugs-address-in-the-from-header" rel="prev">Bug emails now use the bug’s address in the From: header</a> <a href="https://blog.launchpad.net/code/git-protocol-v2-available-at-launchpad" rel="next">Git Protocol v2 Available at Launchpad</a> » </div> <div class="post" id="post-4339"> <h2> <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names" rel="bookmark" title="Permanent Link: Login regression for users with non-ASCII names">Login regression for users with non-ASCII names</a> </h2> <div class="entry"> <p>On 2020-08-13, we deployed an update that caused users whose full names contain non-ASCII characters (which is of course <a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/">very common</a>) to be unable to log into Launchpad. We heard about this serious regression from users on 2020-08-17, and rolled out a fix on 2020-08-18. We’re sorry about this; it doesn’t meet the standards of both inclusion and quality that we set for ourselves. This post aims to explain what happened, technical details of why it happened, and the steps we’ve taken to avoid it happening again.</p> <span id="more-4339"></span> <p>Launchpad still runs on Python 2. This is <a href="https://www.python.org/doc/sunset-python-2/">a problem</a>, and we’ve been gradually chipping away at it for the last couple of years. With about three-quarters of a million lines of Python code in the main tree and over 200 dependencies, it’s a big job – but we’re well underway!</p> <p>Some of those dependencies have been difficult problems in their own right. The one at issue here was <a href="https://pypi.org/project/python-openid/">python-openid</a>, which we use as part of our login workflow, but which hasn’t been actively maintained for over ten years. Fortunately, in this case we didn’t have to port it ourselves, because there were already a couple of forks featuring Python 3 support while preserving more or less the same interface: we chose <a href="https://pypi.org/project/python-openid2/">python-openid2</a> on the grounds that it had done a good job of maintaining both Python 2 and 3 support in the same codebase, which we needed in order to arrange a practical transition, and that it was in itself well-maintained. We worked with upstream to fix a couple of issues discovered by the Launchpad test suite that blocked us migrating to it (notably <a href="https://github.com/ziima/python-openid/pull/41">PR #41</a>, although that was fixed as <a href="https://github.com/ziima/python-openid/pull/43">PR #43</a> instead), and <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/387907">switched Launchpad over</a> once python-openid2 3.2 was released. So far, so good.</p> <p>One of the major reasons for much of the disruption in the Python 3 transition was to provide a clean separation between the concept of a sequence of bytes and a text string, which was often a problem for code that needed to handle Unicode: it’s all too common in Python 2 to have code that works on the ASCII domain (which can be represented either as <code>str</code> or <code>unicode</code>) but that fails on Unicode strings outside that subset. Launchpad is less prone to that than many Python 2 applications because the <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a> we use (<a href="https://pypi.org/project/storm/">Storm</a>) has always been relatively strict about the boundary between bytes and text; nevertheless, having a stricter data model here is a good thing for us in the long term. It might seem ironic that we ran into exactly such a bug as part of porting to Python 3; but then, we aren’t using the new interpreter yet.</p> <p>Launchpad uses the <a href="https://openid.net/specs/openid-simple-registration-extension-1_0.html">OpenID Simple Registration Extension</a> in its login workflow. It specifically requests the user’s full name from Canonical’s OpenID provider (<a href="https://login.ubuntu.com/">login.ubuntu.com</a>, which we generally call “SSO”): this means that if the user has an SSO account but not yet a Launchpad account, we can create a Launchpad account for them without them needing to enter their name again. That full name is encoded as a UTF-8 string, which in turn is URL-encoded using the usual %xx mechanism. This means that if, say, your name is <a href="https://en.wikipedia.org/wiki/Gr%C3%A1inne_N%C3%AD_Mh%C3%A1ille">Gr谩inne N铆 Mh谩ille</a>, it will show up in the OpenID response’s query string as <code>openid.sreg.fullname=Gr%C3%A1inne+N%C3%AD+Mh%C3%A1ille</code>.</p> <p>python-openid2 uses its <code>openid.urinorm</code> module to normalise parts of the response, decoding and re-encoding it to make sure comparisons work as expected; this is built on top of the URL handling code in Python’s standard library. Now, unlike Python 3, Python 2’s <a href="https://docs.python.org/2/library/urllib.html#urllib.urlencode">urlencode</a> has undocumented restrictions on values in the <code>query</code> argument: if the <code>doseq</code> argument is False (the default), then it converts values using <code>str(v)</code>, while if it’s True then it converts Unicode values using <code>v.encode("ASCII", "replace")</code> (potentially losing information!). In this case, <code>doseq</code> is False, and the input given to it is always text (<code>unicode</code> on Python 2): this works fine if the input is within the ASCII subset, but if it’s not:</p> <pre class="wp-block-code"><code>>>> urlencode({u'openid.sreg.fullname': u'Gr谩inne N铆 Mh谩ille'}) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/urllib.py", line 1350, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 2: ordinal not in range(128)</code></pre> <p>The fix is that on Python 2 one must always pass values to <code>urlencode</code> as bytes rather than text:</p> <pre class="wp-block-code"><code>>>> urlencode({u'openid.sreg.fullname': u'Gr谩inne N铆 Mh谩ille'.encode('UTF-8')}) 'openid.sreg.fullname=Gr%C3%A1inne+N%C3%AD+Mh%C3%A1ille'</code></pre> <p>We’ve sent <a href="https://github.com/ziima/python-openid/pull/47">PR #47</a> to python-openid2 to implement this. We’ve also made a temporary local fork of python-openid2 containing this patch and deployed it to Launchpad production.</p> <p>One thing to be clear about here: though the root cause was a bug in python-openid2, it’s <em>our</em> responsibility to make sure it works correctly when integrated into Launchpad.</p> <p>We missed this bug because of a gap in testing: although we did test the full login workflow, we only did so with a test user whose full name was entirely ASCII. We’ve <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/389437">closed this gap</a> now, so we’ll catch it if a dependency regresses in future.</p> <p>Tags: <a href="https://blog.launchpad.net/tag/bug-fixes-2" rel="tag">bug-fixes</a></p> <p class="postmetadata alt"> <small>This entry was posted by <strong>Colin Watson</strong> on Thursday, August 20th, 2020 at 10:01 am and is filed under <a href="https://blog.launchpad.net/category/general" rel="category tag">General</a>. You can follow any responses to this entry through the <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names/feed">RSS 2.0</a> feed. You can <a href="#respond">leave a response</a>, or <a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names/trackback" rel="trackback">trackback</a> from your own site. </small> </p> </div> </div> <!-- You can start editing here. --> <h3 id="comments">One Response to “Login regression for users with non-ASCII names”</h3> <ol class="commentlist"> <li class="alt" id="comment-708580"> <img alt='' src='https://secure.gravatar.com/avatar/?s=32&d=blank&r=g' srcset='https://secure.gravatar.com/avatar/?s=64&d=blank&r=g 2x' class='avatar avatar-32 photo avatar-default' height='32' width='32' loading='lazy'/> <cite>Francisco Jim茅nez Cabrera</cite> Says: <br /> <small class="commentmetadata"><a href="#comment-708580" title="">August 20th, 2020 at 9:56 pm</a> </small> <p>Nice job!</p> </li> </ol> <h3 id="respond">Leave a Reply</h3> <form action="https://blog.launchpad.net/wp-comments-post.php" method="post" id="commentform"> <p><input type="text" name="author" id="author" value="" size="22" tabindex="1" /> <label for="author"><small>Name </small></label></p> <p><input type="text" name="email" id="email" value="" size="22" tabindex="2" /> <label for="email"><small>Mail (will not be published) </small></label></p> <p><input type="text" name="url" id="url" value="" size="22" tabindex="3" /> <label for="url"><small>Website</small></label></p> <!--<p><small><strong>XHTML:</strong> You can use these tags: <code><a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> </code></small></p>--> <p><textarea name="comment" id="comment" cols="100%" rows="10" tabindex="4"></textarea></p> <p><input name="submit" type="submit" id="submit" tabindex="5" value="Submit Comment" /> <input type="hidden" name="comment_post_ID" value="4339" /> </p> <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="18da3433a5" /></p><p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="93"/><script>document.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );</script></p> </form> </div> <div id="footer"> <p> <a href="https://help.launchpad.net/Legal">Terms of use</a> | <a href="https://launchpad.net/feedback">Help improve Launchpad</a> | <a href="https://launchpad.net/faq">FAQ</a> </p> <p><a rel="license" href="http://creativecommons.org/licenses/by/2.0/uk/"> <span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type">Launchpad Blog</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://canonical.com/" property="cc:attributionName" rel="cc:attributionURL">Canonical Ltd</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/2.0/uk/">Creative Commons Attribution 2.0 UK: England & Wales License</a>. <img alt="Creative Commons License" style="border-width:0;vertical-align:middle;" src="https://i.creativecommons.org/l/by/2.0/uk/80x15.png" /></a></p> <p>© 2004-2019 <a href="https://canonical.com/" target="_blank">Canonical Limited.</a></p> </div>