CINXE.COM
LKML: Roland Dreier: [PATCH 06/13] [RFC] ipath LLD core, part 3
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Roland Dreier: [PATCH 06/13] [RFC] ipath LLD core, part 3</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Roland Dreier" href="/groupie.php?aid=3215" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/12"> [Dec]</a> 聽 <a class="nb" href="/lkml/2005/12/16"> [16]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/12/16/297" onclick="this.href='/lkml/headers'+'/2005/12/16/297';">[headers]</a>聽 <a href="/lkml/bounce/2005/12/16/297">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/12/16/290">First message in thread</a></li><li><a href="/lkml/2005/12/16/303">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/301">Roland Dreier</a><ul><li class="origin"><a href="/lkml/2005/12/16/296">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/296">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/305">Roland Dreier</a></li></ul></li><li><a href="/lkml/2005/12/17/74">Andrew Morton</a><ul><li><a href="/lkml/2005/12/17/87">Robert Walsh</a></li><li><a href="/lkml/2005/12/17/90">Robert Walsh</a></li></ul></li></ul></li></ul></li><li><a href="/lkml/2005/12/17/72">Andrew Morton</a><ul><li><a href="/lkml/2005/12/17/91">Robert Walsh</a></li></ul></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/12/16/297/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Subject</td><td class="rp" itemprop="name">[PATCH 06/13] [RFC] ipath LLD core, part 3</td></tr><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Fri, 16 Dec 2005 15:48:55 -0800</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Roland Dreier <></td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">Last part of core driver<br /><br />---<br /><br /> drivers/infiniband/hw/ipath/ipath_driver.c | 2380 ++++++++++++++++++++++++++++<br /> 1 files changed, 2380 insertions(+), 0 deletions(-)<br /><br />f7ffc0cabd62be5e13ad84027d5712e6f92d9cc1<br />diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c<br />index 0dee4ce..87b6dae 100644<br />--- a/drivers/infiniband/hw/ipath/ipath_driver.c<br />+++ b/drivers/infiniband/hw/ipath/ipath_driver.c<br />@@ -4877,3 +4877,2383 @@ static int ipath_wait_intr(ipath_portdat<br /> }<br /> return 0;<br /> }<br />+<br />+/*<br />+ * The new implementation as of Oct 2004 is that the driver assigns<br />+ * the tid and returns it to the caller. To make it easier to<br />+ * catch bugs, and to reduce search time, we keep a cursor for<br />+ * each port, walking the shadow tid array to find one that's not<br />+ * in use.<br />+ *<br />+ * For now, if we can't allocate the full list, we fail, although<br />+ * in the long run, we'll allocate as many as we can, and the<br />+ * caller will deal with that by trying the remaining pages later.<br />+ * That means that when we fail, we have to mark the tids as not in<br />+ * use again, in our shadow copy.<br />+ *<br />+ * It's up to the caller to free the tids when they are done.<br />+ * We'll unlock the pages as they free them.<br />+ *<br />+ * Also, right now we are locking one page at a time, but since<br />+ * the intended use of this routine is for a single group of<br />+ * virtually contiguous pages, that should change to improve<br />+ * performance.<br />+ */<br />+static int ipath_tid_update(ipath_portdata * pd, struct _tidupd *tidu)<br />+{<br />+ int ret = 0, ntids;<br />+ uint32_t tid, porttid, cnt, i, tidcnt;<br />+ struct _tidupd tu;<br />+ uint16_t *tidlist;<br />+ ipath_devdata *dd = &devdata[pd->port_unit];<br />+ uint64_t vaddr, physaddr, lenvalid;<br />+ volatile uint64_t *tidbase;<br />+ uint64_t tidmap[8];<br />+ struct page **pagep = NULL;<br />+<br />+ tu.tidcnt = 0; /* for early errors */<br />+ if (!dd->ipath_pageshadow) {<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+ if (copy_from_user(&tu, tidu, sizeof tu)) {<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+<br />+ if (!(cnt = tu.tidcnt)) {<br />+ _IPATH_DBG("After copyin, tidcnt 0, tidlist %llx\n",<br />+ tu.tidlist);<br />+ /* or should we treat as success? likely a bug */<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+ tidcnt = dd->ipath_rcvtidcnt;<br />+ if (cnt >= tidcnt) { /* make sure it all fits in port_tid_pg_list */<br />+ _IPATH_INFO<br />+ ("Process tried to allocate %u TIDs, only trying max (%u)\n",<br />+ cnt, tidcnt);<br />+ cnt = tidcnt;<br />+ }<br />+ pagep = (struct page **)pd->port_tid_pg_list;<br />+ tidlist = (uint16_t *) (&pagep[cnt]);<br />+<br />+ memset(tidmap, 0, sizeof(tidmap));<br />+ tid = pd->port_tidcursor;<br />+ /* before decrement; chip actual # */<br />+ porttid = pd->port_port * tidcnt;<br />+ ntids = tidcnt;<br />+ tidbase = (volatile uint64_t *)((volatile char *)<br />+ (devdata[pd->port_unit].<br />+ ipath_kregbase) +<br />+ devdata[pd->port_unit].<br />+ ipath_rcvtidbase +<br />+ porttid * sizeof(*tidbase));<br />+<br />+ _IPATH_VDBG("Port%u %u tids, cursor %u, tidbase %p\n", pd->port_port,<br />+ cnt, tid, tidbase);<br />+<br />+ vaddr = tu.tidvaddr; /* virtual address of first page in transfer */<br />+ if (!access_ok(VERIFY_WRITE, (void *)vaddr, cnt * PAGE_SIZE)) {<br />+ _IPATH_DBG("Fail vaddr %llx, %u pages, !access_ok\n",<br />+ vaddr, cnt);<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+ if ((ret = ipath_mlock((unsigned long)vaddr, cnt, pagep))) {<br />+ if (ret == -EBUSY) {<br />+ _IPATH_DBG<br />+ ("Failed to lock addr %p, %u pages (already locked)\n",<br />+ (void *)vaddr, cnt);<br />+ /*<br />+ * for now, continue, and see what happens<br />+ * but with the new implementation, this should<br />+ * never happen, unless perhaps the user has<br />+ * mpin'ed the pages themselves (something we<br />+ * need to test)<br />+ */<br />+ ret = 0;<br />+ } else {<br />+ _IPATH_INFO<br />+ ("Failed to lock addr %p, %u pages: errno %d\n",<br />+ (void *)vaddr, cnt, -ret);<br />+ goto done;<br />+ }<br />+ }<br />+ for (i = 0; i < cnt; i++, vaddr += PAGE_SIZE) {<br />+ for (; ntids--; tid++) {<br />+ if (tid == tidcnt)<br />+ tid = 0;<br />+ if (!dd->ipath_pageshadow[porttid + tid])<br />+ break;<br />+ }<br />+ if (ntids < 0) {<br />+ /*<br />+ * oops, wrapped all the way through their TIDs,<br />+ * and didn't have enough free; see comments at<br />+ * start of routine<br />+ */<br />+ _IPATH_DBG<br />+ ("Not enough free TIDs for %u pages (index %d), failing\n",<br />+ cnt, i);<br />+ i--; /* last tidlist[i] not filled in */<br />+ ret = -ENOMEM;<br />+ break;<br />+ }<br />+ tidlist[i] = tid;<br />+ _IPATH_VDBG("Updating idx %u to TID %u, vaddr %llx\n",<br />+ i, tid, vaddr);<br />+ /* for now we "know" system pages and TID pages are same size */<br />+ /* for ipath_free_tid */<br />+ dd->ipath_pageshadow[porttid + tid] = pagep[i];<br />+ __set_bit(tid, tidmap); /* don't need atomic or it's overhead */<br />+ physaddr = page_to_phys(pagep[i]);<br />+ ipath_stats.sps_pagelocks++;<br />+ _IPATH_VDBG("TID %u, vaddr %llx, physaddr %llx pgp %p\n",<br />+ tid, vaddr, physaddr, pagep[i]);<br />+ /*<br />+ * in words (fixed, full page). could make less for very last<br />+ * page in transfer, but for now we won't worry about it.<br />+ */<br />+ lenvalid = PAGE_SIZE >> 2;<br />+ lenvalid <<= INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ physaddr |= lenvalid | INFINIPATH_RT_VALID;<br />+ ipath_kput_memq(pd->port_unit, &tidbase[tid], physaddr);<br />+ /*<br />+ * don't check this tid in ipath_portshadow, since we<br />+ * just filled it in; start with the next one.<br />+ */<br />+ tid++;<br />+ }<br />+<br />+ if (ret) {<br />+ uint32_t limit;<br />+ uint64_t tidval;<br />+ /*<br />+ * chip errata bug 7358, try to work around it by<br />+ * marking invalid tids as having max length<br />+ */<br />+ tidval =<br />+ (~0ULL & INFINIPATH_RT_BUFSIZE_MASK) <<<br />+ INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ cleanup:<br />+ /* jump here if copy out of updated info failed... */<br />+ _IPATH_DBG("After failure (ret=%d), undo %d of %d entries\n",<br />+ -ret, i, cnt);<br />+ /* same code that's in ipath_free_tid() */<br />+ if ((limit = sizeof(tidmap) * _BITS_PER_BYTE) > tidcnt)<br />+ /* just in case size changes in future */<br />+ limit = tidcnt;<br />+ tid = find_first_bit((const unsigned long *)tidmap, limit);<br />+ /*<br />+ * chip errata bug 7358, try to work around it by<br />+ * marking invalid tids as having max length<br />+ */<br />+ tidval =<br />+ (~0ULL & INFINIPATH_RT_BUFSIZE_MASK) <<<br />+ INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ for (; tid < limit; tid++) {<br />+ if (!test_bit(tid, tidmap))<br />+ continue;<br />+ if (dd->ipath_pageshadow[porttid + tid]) {<br />+ _IPATH_VDBG("Freeing TID %u\n", tid);<br />+ ipath_kput_memq(pd->port_unit, &tidbase[tid],<br />+ tidval);<br />+ dd->ipath_pageshadow[porttid + tid] = NULL;<br />+ ipath_stats.sps_pageunlocks++;<br />+ }<br />+ }<br />+ (void)ipath_munlock(cnt, pagep);<br />+ } else {<br />+ /*<br />+ * copy the updated array, with ipath_tid's filled in,<br />+ * back to user. Since we did the copy in already, this<br />+ * "should never fail"<br />+ * If it does, we have to clean up...<br />+ */<br />+ int r;<br />+ if ((r =<br />+ copy_to_user((void *)tu.tidlist, tidlist,<br />+ cnt * sizeof(*tidlist)))) {<br />+ _IPATH_DBG<br />+ ("Failed to copy out %d TIDs (%lx bytes) to %llx (ret %x)\n",<br />+ cnt, cnt * sizeof(*tidlist), tu.tidlist, r);<br />+ ret = -EFAULT;<br />+ goto cleanup;<br />+ }<br />+ if (copy_to_user((void *)tu.tidmap, tidmap, sizeof tidmap)) {<br />+ _IPATH_DBG("Failed to copy out TID map to %llx\n",<br />+ tu.tidmap);<br />+ ret = -EFAULT;<br />+ goto cleanup;<br />+ }<br />+ if (tid == tidcnt)<br />+ tid = 0;<br />+ pd->port_tidcursor = tid;<br />+ }<br />+<br />+done:<br />+ if (ret)<br />+ _IPATH_DBG<br />+ ("Failed to map %u TID pages, failing with %d, tidu %p\n",<br />+ tu.tidcnt, -ret, tidu);<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * right now we are unlocking one page at a time, but since<br />+ * the intended use of this routine is for a single group of<br />+ * virtually contiguous pages, that should change to improve<br />+ * performance. We check that the TID is in range for this port<br />+ * but otherwise don't check validity; if user has an error and<br />+ * frees the wrong tid, it's only their own data that can thereby<br />+ * be corrupted. We do check that the TID was in use, for sanity<br />+ * We always use our idea of the saved address, not the address that<br />+ * they pass in to us.<br />+ */<br />+<br />+static int ipath_tid_free(ipath_portdata * pd, struct _tidupd *tidu)<br />+{<br />+ int ret = 0;<br />+ uint32_t tid, porttid, cnt, limit, tidcnt;<br />+ struct _tidupd tu;<br />+ ipath_devdata *dd = &devdata[pd->port_unit];<br />+ uint64_t *tidbase;<br />+ uint64_t tidmap[8];<br />+ uint64_t tidval;<br />+<br />+ tu.tidcnt = 0; /* for early errors */<br />+ if (!dd->ipath_pageshadow) {<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+<br />+ if (copy_from_user(&tu, tidu, sizeof tu)) {<br />+ _IPATH_DBG("copy of tidupd structure failed\n");<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+ if (copy_from_user(tidmap, (void *)tu.tidmap, sizeof tidmap)) {<br />+ _IPATH_DBG("copy of tidmap failed\n");<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+<br />+ porttid = pd->port_port * dd->ipath_rcvtidcnt;<br />+ tidbase =<br />+ (uint64_t *) ((char *)(devdata[pd->port_unit].ipath_kregbase) +<br />+ devdata[pd->port_unit].ipath_rcvtidbase +<br />+ porttid * sizeof(*tidbase));<br />+<br />+ tidcnt = dd->ipath_rcvtidcnt;<br />+ if ((limit = sizeof(tidmap) * _BITS_PER_BYTE) > tidcnt)<br />+ limit = tidcnt; /* just in case size changes in future */<br />+ tid = find_first_bit((const unsigned long *)tidmap, limit);<br />+ _IPATH_VDBG<br />+ ("Port%u free %u tids; first bit (max=%d) set is %d, porttid %u\n",<br />+ pd->port_port, tu.tidcnt, limit, tid, porttid);<br />+ /*<br />+ * chip errata bug 7358, try to work around it by marking invalid<br />+ * tids as having max length<br />+ */<br />+ tidval =<br />+ (~0ULL & INFINIPATH_RT_BUFSIZE_MASK) << INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ for (cnt = 0; tid < limit; tid++) {<br />+ /*<br />+ * small optimization; if we detect a run of 3 or so without<br />+ * any set, use find_first_bit again. That's mainly to <br />+ * accelerate the case where we wrapped, so we have some at <br />+ * the beginning, and some at the end, and a big gap<br />+ * in the middle.<br />+ */<br />+ if (!test_bit(tid, tidmap))<br />+ continue;<br />+ cnt++;<br />+ if (dd->ipath_pageshadow[porttid + tid]) {<br />+ _IPATH_VDBG("Freeing TID %u\n", tid);<br />+ ipath_kput_memq(pd->port_unit, &tidbase[tid], tidval);<br />+ ipath_munlock(1, &dd->ipath_pageshadow[porttid + tid]);<br />+ dd->ipath_pageshadow[porttid + tid] = NULL;<br />+ ipath_stats.sps_pageunlocks++;<br />+ } else<br />+ _IPATH_DBG("Unused tid %u, ignoring\n", tid);<br />+ }<br />+ if (cnt != tu.tidcnt)<br />+ _IPATH_DBG("passed in tidcnt %d, only %d bits set in map\n",<br />+ tu.tidcnt, cnt);<br />+done:<br />+ if (ret)<br />+ _IPATH_DBG("Failed to unmap %u TID pages, failing with %d\n",<br />+ tu.tidcnt, -ret);<br />+ return ret;<br />+}<br />+<br />+/* called from user init code, and also layered driver init */<br />+int ipath_setrcvhdrsize(const ipath_type mdev, unsigned rhdrsize)<br />+{<br />+ int ret = 0;<br />+ if (devdata[mdev].ipath_flags & IPATH_RCVHDRSZ_SET) {<br />+ if (devdata[mdev].ipath_rcvhdrsize != rhdrsize) {<br />+ _IPATH_INFO<br />+ ("Error: can't set protocol header size %u, already %u\n",<br />+ rhdrsize, devdata[mdev].ipath_rcvhdrsize);<br />+ ret = -EAGAIN;<br />+ } else<br />+ /* OK if set already, with same value, nothing to do */<br />+ _IPATH_VDBG("Reuse same protocol header size %u\n",<br />+ devdata[mdev].ipath_rcvhdrsize);<br />+ } else if (rhdrsize ><br />+ (devdata[mdev].ipath_rcvhdrentsize -<br />+ (sizeof(uint64_t) / sizeof(uint32_t)))) {<br />+ _IPATH_DBG<br />+ ("Error: can't set protocol header size %u (> max %u)\n",<br />+ rhdrsize,<br />+ devdata[mdev].ipath_rcvhdrentsize -<br />+ (uint32_t) (sizeof(uint64_t) / sizeof(uint32_t)));<br />+ ret = -EOVERFLOW;<br />+ } else {<br />+ devdata[mdev].ipath_flags |= IPATH_RCVHDRSZ_SET;<br />+ devdata[mdev].ipath_rcvhdrsize = rhdrsize;<br />+ ipath_kput_kreg(mdev, kr_rcvhdrsize,<br />+ devdata[mdev].ipath_rcvhdrsize);<br />+ _IPATH_VDBG("Set protocol header size to %u\n",<br />+ devdata[mdev].ipath_rcvhdrsize);<br />+ }<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * find an available pio buffer, and do appropriate marking as busy, etc.<br />+ * returns buffer number if one found (>=0), negative number is error.<br />+ * Used by ipath_send_smapkt and ipath_layer_send<br />+ */<br />+int ipath_getpiobuf(int mdev)<br />+{<br />+ int i, j, starti, updated = 0;<br />+ unsigned piobcnt, iter;<br />+ unsigned long flags;<br />+ ipath_devdata *dd = &devdata[mdev];<br />+ uint64_t *shadow = dd->ipath_pioavailshadow;<br />+<br />+ piobcnt = (unsigned)dd->ipath_piobcnt;<br />+ starti = dd->ipath_lastport_piobuf;<br />+ iter = piobcnt - starti;<br />+ if (dd->ipath_upd_pio_shadow) {<br />+ /*<br />+ * minor optimization. If we had no buffers on last call,<br />+ * start out by doing the update; continue and do scan<br />+ * even if no buffers were updated, to be paranoid<br />+ */<br />+ ipath_update_pio_bufs(mdev);<br />+ /* we scanned here, don't do it at end of scan */<br />+ updated = 1;<br />+ i = starti;<br />+ } else<br />+ i = dd->ipath_lastpioindex;<br />+<br />+rescan:<br />+ /*<br />+ * while test_and_set_bit() is atomic,<br />+ * we do that and then the change_bit(), and the pair is not.<br />+ * See if this is the cause of the remaining armlaunch errors.<br />+ */<br />+ spin_lock_irqsave(&ipath_pioavail_lock, flags);<br />+ for (j = 0; j < iter; j++, i++) {<br />+ if (i >= piobcnt)<br />+ i = starti;<br />+ /*<br />+ * To avoid bus lock overhead, we first find a candidate<br />+ * buffer, then do the test and set, and continue if<br />+ * that fails.<br />+ */<br />+ if (test_bit((2 * i) + 1, shadow) ||<br />+ test_and_set_bit((2 * i) + 1, shadow)) {<br />+ continue;<br />+ }<br />+ /* flip generation bit */<br />+ change_bit(2 * i, shadow);<br />+ break;<br />+ }<br />+ spin_unlock_irqrestore(&ipath_pioavail_lock, flags);<br />+<br />+ if (j == iter) {<br />+ /*<br />+ * first time through; shadow exhausted, but may be<br />+ * real buffers available, so go see; if any updated,<br />+ * rescan (once)<br />+ */<br />+ if (!updated) {<br />+ ipath_update_pio_bufs(mdev);<br />+ updated = 1;<br />+ i = starti;<br />+ goto rescan;<br />+ }<br />+ dd->ipath_upd_pio_shadow = 1;<br />+ /* not atomic, but if we lose one once in a while, that's OK */<br />+ ipath_stats.sps_nopiobufs++;<br />+ if (!(++dd->ipath_consec_nopiobuf % 100000)) {<br />+ _IPATH_DBG<br />+ ("%u pio sends with no bufavail; dmacopy: %llx %llx %llx %llx; shadow: %llx %llx %llx %llx\n",<br />+ dd->ipath_consec_nopiobuf,<br />+ dd->ipath_pioavailregs_dma[0],<br />+ dd->ipath_pioavailregs_dma[1],<br />+ dd->ipath_pioavailregs_dma[2],<br />+ dd->ipath_pioavailregs_dma[3],<br />+ shadow[0], shadow[1], shadow[2], shadow[3]);<br />+ /*<br />+ * 4 buffers per byte, 4 registers above, cover<br />+ * rest below<br />+ */<br />+ if (dd->ipath_piobcnt > (sizeof(shadow[0]) * 4 * 4))<br />+ _IPATH_DBG<br />+ ("2nd group: dmacopy: %llx %llx %llx %llx; shadow: %llx %llx %llx %llx\n",<br />+ dd->ipath_pioavailregs_dma[4],<br />+ dd->ipath_pioavailregs_dma[5],<br />+ dd->ipath_pioavailregs_dma[6],<br />+ dd->ipath_pioavailregs_dma[7],<br />+ shadow[4], shadow[5], shadow[6],<br />+ shadow[7]);<br />+ }<br />+ return -EBUSY;<br />+ }<br />+<br />+ if (updated && dd->ipath_layer.l_intr) {<br />+ /*<br />+ * ran out of bufs, now some (at least this one we just got)<br />+ * are now available, so tell the layered driver.<br />+ */<br />+ dd->ipath_layer.l_intr(mdev, IPATH_LAYER_INT_SEND_CONTINUE);<br />+ }<br />+<br />+ /*<br />+ * set next starting place. Since it's just an optimization,<br />+ * it doesn't matter who wins on this, so no locking<br />+ */<br />+ dd->ipath_lastpioindex = i + 1;<br />+ if(dd->ipath_upd_pio_shadow)<br />+ dd->ipath_upd_pio_shadow = 0;<br />+ if(dd->ipath_consec_nopiobuf)<br />+ dd->ipath_consec_nopiobuf = 0;<br />+ return i;<br />+}<br />+<br />+/*<br />+ * this is like ipath_getpiobuf(), except it just probes to see if a buffer<br />+ * is available. If it returns that there is one, it's not allocated,<br />+ * and so may not be available if caller tries to send.<br />+ * NOTE: This can be called from interrupt context by ipath_intr()<br />+ * and from non-interrupt context by layer_send_getpiobuf().<br />+ */<br />+int ipath_bufavail(int mdev)<br />+{<br />+ int i;<br />+ unsigned piobcnt;<br />+ uint64_t *shadow = devdata[mdev].ipath_pioavailshadow;<br />+<br />+ piobcnt = (unsigned)devdata[mdev].ipath_piobcnt;<br />+<br />+ for (i = devdata[mdev].ipath_lastport_piobuf; i < piobcnt; i++)<br />+ if (!test_bit((2 * i) + 1, shadow))<br />+ return 1;<br />+<br />+ /* if none, check for update and rescan if we updated */<br />+ ipath_update_pio_bufs(mdev);<br />+ for (i = devdata[mdev].ipath_lastport_piobuf; i < piobcnt; i++)<br />+ if (!test_bit((2 * i) + 1, shadow))<br />+ return 1;<br />+ _IPATH_PDBG("No bufs avail\n");<br />+ return 0;<br />+}<br />+<br />+/*<br />+ * This routine is no longer on any critical paths; it is used only<br />+ * for sending SMA packets, but that could change in the future, so it<br />+ * should be kept pretty tight, with anything that<br />+ * increases the cache footprint, adds branches, etc. carefully<br />+ * examined, and if needed only for unusual cases, should, be moved out to<br />+ * a separate routine, or out of the main execution path.<br />+ * Because it's currently sma only, there are no checks to see if the<br />+ * link is up; sma must be able to send in the not fully initialized state<br />+ */<br />+int ipath_send_smapkt(struct ipath_sendpkt * upkt)<br />+{<br />+ int i, ret = 0, whichpb;<br />+ uint32_t *piobuf, plen = 0, clen;<br />+ uint64_t pboff;<br />+ struct ipath_sendpkt kpkt;<br />+ struct ipath_iovec *iov = kpkt.sps_iov;<br />+ ipath_type t;<br />+<br />+ if (unlikely((copy_from_user(&kpkt, upkt, sizeof kpkt))))<br />+ ret = -EFAULT;<br />+ if (ret) {<br />+ _IPATH_VDBG("Send failed: error %d\n", -ret);<br />+ goto done;<br />+ }<br />+ t = kpkt.sps_flags;<br />+ if (t >= infinipath_max || !(devdata[t].ipath_flags & IPATH_PRESENT) ||<br />+ !devdata[t].ipath_kregbase) {<br />+ _IPATH_SMADBG("illegal unit %u for sma send\n", t);<br />+ return -ENODEV;<br />+ }<br />+ if (!(devdata[t].ipath_flags & IPATH_INITTED)) {<br />+ /* no hardware, freeze, etc. */<br />+ _IPATH_SMADBG("unit %u not usable\n", t);<br />+ return -ENODEV;<br />+ }<br />+<br />+ /* need total length before first word written */<br />+ plen = sizeof(uint32_t); /* +1 word is for the qword padding */<br />+ for (i = 0; i < kpkt.sps_cnt; i++)<br />+ /* each must be dword multiple */<br />+ plen += kpkt.sps_iov[i].iov_len;<br />+<br />+ if ((plen + 4) > devdata[t].ipath_ibmaxlen) {<br />+ _IPATH_DBG("Pkt len 0x%x > ibmaxlen %x!\n", plen - 4,<br />+ devdata[t].ipath_ibmaxlen);<br />+ ret = -EINVAL;<br />+ goto done; /* before writing pbc */<br />+ }<br />+ plen >>= 2; /* in words */<br />+<br />+ whichpb = ipath_getpiobuf(t);<br />+ if (whichpb < 0) {<br />+ ret = whichpb;<br />+ devdata[t].ipath_nosma_bufs++;<br />+ _IPATH_SMADBG("No PIO buffers available unit %u %u times\n",<br />+ t, devdata[t].ipath_nosma_bufs);<br />+ goto done;<br />+ }<br />+ if(devdata[t].ipath_nosma_bufs) {<br />+ _IPATH_SMADBG(<br />+ "Unit %u got SMA send buffer after %u failures, %u seconds\n",<br />+ t, devdata[t].ipath_nosma_bufs, devdata[t].ipath_nosma_secs);<br />+ devdata[t].ipath_nosma_bufs = 0;<br />+ devdata[t].ipath_nosma_secs = 0;<br />+ }<br />+ if((devdata[t].ipath_lastibcstat & 0x11) != 0x11 &&<br />+ (devdata[t].ipath_lastibcstat & 0x21) != 0x21) {<br />+ /* we need to be at least at INIT for SMA packets to go out. If we<br />+ * aren't, something has gone wrong, and SMA hasn't noticed.<br />+ * Therefore we'll try to go to INIT here, in hopes of fixing up the<br />+ * problem. First we verify that indeed the state is still "bad"<br />+ * (that is, that lastibcstat * isn't "stale") */<br />+ uint64_t val;<br />+ val = ipath_kget_kreg64(t, kr_ibcstatus);<br />+ if((val & 0x11) != 0x11 && (val & 0x21) != 0x21) {<br />+ _IPATH_SMADBG("Invalid Link state 0x%llx unit %u for send, try INIT\n",<br />+ val, t);<br />+ ipath_set_ib_lstate(t, INFINIPATH_IBCC_LINKCMD_INIT);<br />+ val = ipath_kget_kreg64(t, kr_ibcstatus);<br />+ if((val & 0x11) != 0x11 && (val & 0x21) != 0x21)<br />+ _IPATH_SMADBG("Link state still not OK unit %u (0x%llx) after INIT\n",<br />+ t, val);<br />+ else<br />+ _IPATH_SMADBG("Link state OK unit %u (0x%llx) after INIT\n",<br />+ t, val);<br />+ }<br />+ /* and continue, regardless */<br />+ }<br />+<br />+ pboff = devdata[t].ipath_piobufbase;<br />+ piobuf = (uint32_t *) (((char *)(devdata[t].ipath_kregbase)) + pboff<br />+ + whichpb * devdata[t].ipath_palign);<br />+<br />+ if(infinipath_debug & __IPATH_PKTDBG) // SMA and PKT, both<br />+ _IPATH_SMADBG("unit %u 0x%x+1w pio%d, (scnt %d)\n",<br />+ t, plen - 1, whichpb, kpkt.sps_cnt);<br />+<br />+ ret = 0;<br />+ clen = 2; /* size of the pbc */<br />+ {<br />+ /*<br />+ * If this code ever gets used for anything performance<br />+ * oriented, or that isn't inherently single-threaded,<br />+ * then I need to implement the original idea of our<br />+ * own equivalent of copy_from_user that uses only dword<br />+ * or qword copies. copy_from_user() can use byte copies,<br />+ * and that is a problem for our chip.<br />+ */<br />+ static uint32_t tmpbuf[2176 / sizeof(uint32_t)];<br />+ *(uint64_t *) tmpbuf = (uint64_t) plen;<br />+ for (i = 0; i < kpkt.sps_cnt; i++) {<br />+ if (unlikely<br />+ (copy_from_user<br />+ (tmpbuf + clen, (void *)iov->iov_base,<br />+ iov->iov_len)))<br />+ ret = -EFAULT; /* no break */<br />+ clen += iov->iov_len >> 2;<br />+ iov++;<br />+ }<br />+ ipath_dwordcpy(piobuf, tmpbuf, clen);<br />+ }<br />+<br />+ /* flush the packet out now, don't leave it waiting around */<br />+ mb();<br />+<br />+ if (ret) {<br />+ /*<br />+ * Packet is bad, so we need to use the PIO abort mechanism to<br />+ * abort the packet<br />+ */<br />+ uint32_t sendctrl;<br />+ sendctrl = devdata[t].ipath_sendctrl | INFINIPATH_S_DISARM |<br />+ (whichpb << INFINIPATH_S_DISARMPIOBUF_SHIFT);<br />+ _IPATH_DBG("Doing PIO abort on buffer %u after error\n",<br />+ whichpb);<br />+ ipath_kput_kreg(t, kr_sendctrl, sendctrl);<br />+ }<br />+<br />+done:<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * implemention of the ioctl to get the counter values from the chip<br />+ * For the time being, we get all of them when asked, no shadowing.<br />+ * We need to shadow the byte counters at a minimum, because otherwise<br />+ * they will wrap in just a few seconds at full bandwidth<br />+ * The second argument is the user address to which we do the copy_to_user()<br />+ */<br />+static int ipath_get_counters(ipath_type t,<br />+ struct infinipath_counters * ucounters)<br />+{<br />+ int ret = 0;<br />+ uint64_t val;<br />+ uint64_t *ucreg;<br />+ uint16_t vcreg;<br />+<br />+ ucreg = (uint64_t *) ucounters;<br />+ /*<br />+ * for now, let's do this one at a time. It's not the most<br />+ * optimal method, but it is simple, and has no intermediate<br />+ * memory requirements.<br />+ */<br />+ for (vcreg = 0;<br />+ vcreg < (sizeof(struct infinipath_counters) / sizeof(val));<br />+ vcreg++, ucreg++) {<br />+ ipath_creg creg = vcreg;<br />+ val = ipath_snap_cntr(t, creg);<br />+ if ((ret = copy_to_user(ucreg, &val, sizeof(val)))) {<br />+ _IPATH_DBG("copy_to_user error on counter %d\n", creg);<br />+ break;<br />+ }<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * implemention of the ioctl to get the stats values from the driver<br />+ * The argument is the user address to which we do the copy_to_user()<br />+ */<br />+static int ipath_get_stats(struct infinipath_stats *ustats)<br />+{<br />+ int ret = 0;<br />+<br />+ if ((ret = copy_to_user(ustats, &ipath_stats, sizeof(ipath_stats))))<br />+ _IPATH_DBG("copy_to_user error on driver stats\n");<br />+<br />+ return ret;<br />+}<br />+<br />+/* set a partition key. We can have up to 4 active at a time (other than<br />+ * the default, which is always allowed). This is somewhat tricky, since<br />+ * multiple ports may set the same key, so we reference count them, and<br />+ * clean up at exit. All 4 partition keys are packed into a single<br />+ * infinipath register. It's an error for a process to set the same<br />+ * pkey multiple times. We provide no mechanism to de-allocate a pkey<br />+ * at this time, we may eventually need to do that.<br />+ * I've used the atomic operations, and no locking, and only make a single<br />+ * pass through what's available. This should be more than adequate for<br />+ * some time. I'll think about spinlocks or the like if and as it's necessary<br />+ */<br />+static int ipath_set_partkey(ipath_portdata *pd, uint16_t key)<br />+{<br />+ ipath_devdata *dd;<br />+ int i, any = 0, pidx = -1;<br />+ uint16_t lkey = key & 0x7FFF;<br />+<br />+ dd = &devdata[pd->port_unit];<br />+<br />+ if (lkey == (IPS_DEFAULT_P_KEY & 0x7FFF)) {<br />+ /* nothing to do; this key always valid */<br />+ return 0;<br />+ }<br />+<br />+ _IPATH_VDBG<br />+ ("p%u try to set pkey %hx, current keys %hx:%x %hx:%x %hx:%x %hx:%x\n",<br />+ pd->port_port, key, dd->ipath_pkeys[0],<br />+ atomic_read(&dd->ipath_pkeyrefs[0]), dd->ipath_pkeys[1],<br />+ atomic_read(&dd->ipath_pkeyrefs[1]), dd->ipath_pkeys[2],<br />+ atomic_read(&dd->ipath_pkeyrefs[2]), dd->ipath_pkeys[3],<br />+ atomic_read(&dd->ipath_pkeyrefs[3]));<br />+<br />+ if (!lkey) {<br />+ _IPATH_PRDBG("p%u tries to set key 0, not allowed\n",<br />+ pd->port_port);<br />+ return -EINVAL;<br />+ }<br />+<br />+ /*<br />+ * Set the full membership bit, because it has to be<br />+ * set in the register or the packet, and it seems<br />+ * cleaner to set in the register than to force all<br />+ * callers to set it. (see bug 4331)<br />+ */<br />+ key |= 0x8000;<br />+<br />+ for (i = 0; i < ARRAY_SIZE(pd->port_pkeys); i++) {<br />+ if (!pd->port_pkeys[i] && pidx == -1)<br />+ pidx = i;<br />+ if (pd->port_pkeys[i] == key) {<br />+ _IPATH_VDBG<br />+ ("p%u tries to set same pkey (%x) more than once\n",<br />+ pd->port_port, key);<br />+ return -EEXIST;<br />+ }<br />+ }<br />+ if (pidx == -1) {<br />+ _IPATH_DBG<br />+ ("All pkeys for port %u already in use, can't set %x\n",<br />+ pd->port_port, key);<br />+ return -EBUSY;<br />+ }<br />+ for (any = i = 0; i < ARRAY_SIZE(dd->ipath_pkeys); i++) {<br />+ if (!dd->ipath_pkeys[i]) {<br />+ any++;<br />+ continue;<br />+ }<br />+ if (dd->ipath_pkeys[i] == key) {<br />+ if (atomic_inc_return(&dd->ipath_pkeyrefs[i]) > 1) {<br />+ pd->port_pkeys[pidx] = key;<br />+ _IPATH_VDBG<br />+ ("p%u set key %x matches #%d, count now %d\n",<br />+ pd->port_port, key, i,<br />+ atomic_read(&dd->ipath_pkeyrefs[i]));<br />+ return 0;<br />+ } else {<br />+ /* lost race, decrement count, catch below */<br />+ atomic_dec(&dd->ipath_pkeyrefs[i]);<br />+ _IPATH_VDBG<br />+ ("Lost race, count was 0, after dec, it's %d\n",<br />+ atomic_read(&dd->ipath_pkeyrefs[i]));<br />+ any++;<br />+ }<br />+ }<br />+ if ((dd->ipath_pkeys[i] & 0x7FFF) == lkey) {<br />+ /*<br />+ * It makes no sense to have both the limited and full<br />+ * membership PKEY set at the same time since the<br />+ * unlimited one will disable the limited one.<br />+ */<br />+ return -EEXIST;<br />+ }<br />+ }<br />+ if (!any) {<br />+ _IPATH_DBG<br />+ ("port %u, all pkeys already in use, can't set %x\n",<br />+ pd->port_port, key);<br />+ return -EBUSY;<br />+ }<br />+ for (any = i = 0; i < ARRAY_SIZE(dd->ipath_pkeys); i++) {<br />+ if (!dd->ipath_pkeys[i] &&<br />+ atomic_inc_return(&dd->ipath_pkeyrefs[i]) == 1) {<br />+ uint64_t pkey;<br />+<br />+ /* for ipathstats, etc. */<br />+ ipath_stats.sps_pkeys[i] = lkey;<br />+ pd->port_pkeys[pidx] = dd->ipath_pkeys[i] = key;<br />+ pkey =<br />+ (uint64_t) dd->ipath_pkeys[0] |<br />+ ((uint64_t) dd->ipath_pkeys[1] << 16) |<br />+ ((uint64_t) dd->ipath_pkeys[2] << 32) |<br />+ ((uint64_t) dd->ipath_pkeys[3] << 48);<br />+ _IPATH_PRDBG<br />+ ("p%u set key %x in #%d, portidx %d, new pkey reg %llx\n",<br />+ pd->port_port, key, i, pidx, pkey);<br />+ ipath_kput_kreg(pd->port_unit, kr_partitionkey, pkey);<br />+<br />+ return 0;<br />+ }<br />+ }<br />+ _IPATH_DBG<br />+ ("port %u, all pkeys already in use 2nd pass, can't set %x\n",<br />+ pd->port_port, key);<br />+ return -EBUSY;<br />+}<br />+<br />+/*<br />+ * stop_start == 0 disables receive on the port, for use in queue overflow<br />+ * conditions. stop_start==1 re-enables, and returns value of tail register,<br />+ * to be used to re-init the software copy of the head register<br />+ */<br />+<br />+static int ipath_manage_rcvq(ipath_portdata * pd, uint16_t start_stop)<br />+{<br />+ ipath_devdata *dd;<br />+ /*<br />+ * This needs to be volatile, so that the compiler doesn't<br />+ * optimize away the read to the device's mapped memory.<br />+ */<br />+ volatile uint64_t tval;<br />+<br />+ dd = &devdata[pd->port_unit];<br />+ _IPATH_PRDBG("%sabling rcv for unit %u port %u\n",<br />+ start_stop ? "en" : "dis", pd->port_unit, pd->port_port);<br />+ /* atomically clear receive enable port. */<br />+ if (start_stop) {<br />+ /*<br />+ * on enable, force in-memory copy of the tail register<br />+ * to 0, so that protocol code doesn't have to worry<br />+ * about whether or not the chip has yet updated<br />+ * the in-memory copy or not on return from the system<br />+ * call. The chip always resets it's tail register back<br />+ * to 0 on a transition from disabled to enabled.<br />+ * This could cause a problem if software was broken,<br />+ * and did the enable w/o the disable, but eventually<br />+ * the in-memory copy will be updated and correct<br />+ * itself, even in the face of software bugs.<br />+ */<br />+ *pd->port_rcvhdrtail_kvaddr = 0;<br />+ atomic_set_mask(1U <<<br />+ (INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port),<br />+ &dd->ipath_rcvctrl);<br />+ } else<br />+ atomic_clear_mask(1U <<<br />+ (INFINIPATH_R_PORTENABLE_SHIFT +<br />+ pd->port_port), &dd->ipath_rcvctrl);<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvctrl, dd->ipath_rcvctrl);<br />+ /* now be sure chip saw it before we return */<br />+ tval = ipath_kget_kreg64(pd->port_unit, kr_scratch);<br />+ if (start_stop) {<br />+ /*<br />+ * and try to be sure that tail reg update has happened<br />+ * too. This should in theory interlock with the RXE<br />+ * changes to the tail register. Don't assign it to<br />+ * the tail register in memory copy, since we could<br />+ * overwrite an update by the chip if we did.<br />+ */<br />+ tval =<br />+ ipath_kget_ureg32(pd->port_unit, ur_rcvhdrtail,<br />+ pd->port_port);<br />+ }<br />+ /* always; new head should be equal to new tail; see above */<br />+ return 0;<br />+}<br />+<br />+/*<br />+ * This routine is now quite different for user and kernel, because<br />+ * the kernel uses skb's, for the accelerated network performance<br />+ * This is the user port version<br />+ *<br />+ * allocate the eager TID buffers and program them into infinipath<br />+ * They are no longer completely contiguous, we do multiple<br />+ * alloc_pages() calls.<br />+ */<br />+static int ipath_create_user_egr(ipath_portdata * pd)<br />+{<br />+ char *buf;<br />+ ipath_devdata *dd = &devdata[pd->port_unit];<br />+ uint64_t *egrbase, egroff, lenvalid;<br />+ unsigned e, egrcnt, alloced, order, egrperchunk, chunk;<br />+ unsigned long pa, pent;<br />+<br />+ egrcnt = dd->ipath_rcvegrcnt;<br />+ egroff =<br />+ dd->ipath_rcvegrbase + pd->port_port * egrcnt * sizeof(*egrbase);<br />+ egrbase = (uint64_t *) ((char *)(dd->ipath_kregbase) + egroff);<br />+ _IPATH_VDBG("Allocating %d egr buffers, at chip offset %llx (%p)\n",<br />+ egrcnt, egroff, egrbase);<br />+<br />+ /*<br />+ * to avoid wasting a lot of memory, we allocate 32KB chunks of<br />+ * physically contiguous memory, advance through it until used up<br />+ * and then allocate more. Of course, we need memory to store<br />+ * those extra pointers, now. Started out with 256KB, but under<br />+ * heavy memory pressure (creating large files and then copying<br />+ * them over NFS while doing lots of MPI jobs), we hit some<br />+ * alloc_pages() failures, even though we can sleep... (2.6.10)<br />+ * Still get failures at 64K. 32K is the lowest we can go without<br />+ * waiting more memory again. It seems likely that the coalescing<br />+ * in free_pages, etc. still has issues (as it has had previously<br />+ * during 2.6.x development).<br />+ */<br />+ order = get_order(0x8000);<br />+ alloced =<br />+ round_up(dd->ipath_rcvegrbufsize * egrcnt,<br />+ (1 << order) * PAGE_SIZE);<br />+ egrperchunk = ((1 << order) * PAGE_SIZE) / dd->ipath_rcvegrbufsize;<br />+ chunk = (egrcnt + egrperchunk - 1) / egrperchunk;<br />+ pd->port_rcvegrbuf_chunks = chunk;<br />+ pd->port_rcvegrbufs_perchunk = egrperchunk;<br />+ pd->port_rcvegrbuf_order = order;<br />+ pd->port_rcvegrbuf_pages =<br />+ vmalloc(chunk * sizeof(pd->port_rcvegrbuf_pages[0]));<br />+ pd->port_rcvegrbuf_virt =<br />+ vmalloc(chunk * sizeof(pd->port_rcvegrbuf_virt[0]));<br />+ if (!pd->port_rcvegrbuf_pages || !pd->port_rcvegrbuf_pages) {<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "Unable to allocate %u EGR buffer array pointers\n",<br />+ chunk);<br />+ if (pd->port_rcvegrbuf_pages) {<br />+ vfree(pd->port_rcvegrbuf_pages);<br />+ pd->port_rcvegrbuf_pages = NULL;<br />+ }<br />+ return -ENOMEM;<br />+ }<br />+ for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) {<br />+ /*<br />+ * GFP_USER, but without GFP_FS, so buffer cache can<br />+ * be coalesced (we hope); otherwise, even at order 4, heavy<br />+ * filesystem activity makes these fail<br />+ */<br />+ if (!<br />+ (pd->port_rcvegrbuf_pages[e] =<br />+ alloc_pages(__GFP_WAIT | __GFP_IO, order))) {<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "Unable to allocate EGR buffer array %u/%u\n",<br />+ e, pd->port_rcvegrbuf_chunks);<br />+ vfree(pd->port_rcvegrbuf_pages);<br />+ pd->port_rcvegrbuf_pages = NULL;<br />+ vfree(pd->port_rcvegrbuf_virt);<br />+ pd->port_rcvegrbuf_virt = NULL;<br />+ return -ENOMEM;<br />+ }<br />+ }<br />+<br />+ /*<br />+ * calculate physical, then phys_to_virt()<br />+ * so that we get an address that fits in 64 bits, so we can use<br />+ * mmap64 from 32 bit programs on the chip and kernel virtual<br />+ * addresses (mmap64 for 32 bit programs on i386 and x86_64<br />+ * only has 44 bits of address, because it uses mmap2())<br />+ * We do this with the first chunk; We don't need a kernel<br />+ * virtually contiguous address to give the user virtually<br />+ * contiguous mappings. It just complicates the nopage routine<br />+ * a little tiny bit ;)<br />+ */<br />+ buf = page_address(pd->port_rcvegrbuf_pages[0]);<br />+ pa = virt_to_phys(buf);<br />+ pd->port_rcvegr_phys = pa;<br />+<br />+ /* in words */<br />+ lenvalid = (dd->ipath_rcvegrbufsize - pd->port_egrskip) >> 2;<br />+ _IPATH_VDBG<br />+ ("port%u egrbuf vaddr %p, cpu %d, egrskip %u, len %llx words\n",<br />+ pd->port_port, buf, smp_processor_id(), pd->port_egrskip,<br />+ lenvalid);<br />+ lenvalid <<= INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ lenvalid |= INFINIPATH_RT_VALID;<br />+<br />+ for (e = chunk = 0; chunk < pd->port_rcvegrbuf_chunks; chunk++) {<br />+ int i, n;<br />+ struct page *p;<br />+ p = pd->port_rcvegrbuf_pages[chunk];<br />+ pa = page_to_phys(p);<br />+ buf = page_address(p);<br />+ /*<br />+ * stash away for later use, since page_address() lookup<br />+ * is not cheap<br />+ */<br />+ pd->port_rcvegrbuf_virt[chunk] = buf;<br />+ if (pa & ~INFINIPATH_RT_ADDR_MASK)<br />+ _IPATH_INFO<br />+ ("physaddr %lx has more than 40 bits, using only 40!\n",<br />+ pa);<br />+ n = 1 << pd->port_rcvegrbuf_order;<br />+ for (i = 0; i < n; i++)<br />+ SetPageReserved(virt_to_page(buf + (i * PAGE_SIZE)));<br />+<br />+ /* clear buffer for security, sanity, and, debugging */<br />+ memset(buf, 0, PAGE_SIZE * n);<br />+<br />+ for (i = 0; e < egrcnt && i < egrperchunk; e++, i++) {<br />+ pent =<br />+ ((pa +<br />+ pd-><br />+ port_egrskip) & INFINIPATH_RT_ADDR_MASK) |<br />+ lenvalid;<br />+<br />+ ipath_kput_memq(pd->port_unit, &egrbase[e], pent);<br />+ _IPATH_VDBG("egr %u phys %lx val %lx\n", e, pa, pent);<br />+ pa += dd->ipath_rcvegrbufsize;<br />+ }<br />+ yield(); /* don't hog the cpu */<br />+ }<br />+<br />+ return 0;<br />+}<br />+<br />+/*<br />+ * This routine is now quite different for user and kernel, because<br />+ * the kernel uses skb's, for the accelerated network performance<br />+ * This is the kernel (port0) version<br />+ *<br />+ * Allocate the eager TID buffers and program them into infinipath.<br />+ * We use the network layer alloc_skb() allocator to allocate the memory, and<br />+ * either use the buffers as is for things like SMA packets, or pass<br />+ * the buffers up to the ipath layered driver and thence the network layer,<br />+ * replacing them as we do so (see ipath_kreceive())<br />+ */<br />+static int ipath_create_port0_egr(ipath_portdata * pd)<br />+{<br />+ int ret = 0;<br />+ uint64_t *egrbase, egroff;<br />+ unsigned e, egrcnt;<br />+ ipath_devdata *dd;<br />+ struct sk_buff **skbs;<br />+<br />+ dd = &devdata[pd->port_unit];<br />+ egrcnt = dd->ipath_rcvegrcnt;<br />+ egroff =<br />+ dd->ipath_rcvegrbase + pd->port_port * egrcnt * sizeof(*egrbase);<br />+ egrbase = (uint64_t *) ((char *)(dd->ipath_kregbase) + egroff);<br />+ _IPATH_VDBG<br />+ ("unit%u Allocating %d egr buffers, at chip offset %llx (%p)\n",<br />+ pd->port_unit, egrcnt, egroff, egrbase);<br />+<br />+ skbs = vmalloc(sizeof(*dd->ipath_port0_skbs) * egrcnt);<br />+ if (skbs == NULL)<br />+ ret = -ENOMEM;<br />+ else {<br />+ for (e = 0; e < egrcnt; e++) {<br />+ /*<br />+ * This is a bit tricky in that we allocate<br />+ * extra space for 2 bytes of the 14 byte<br />+ * ethernet header. These two bytes are passed<br />+ * in the ipath header so the rest of the data<br />+ * is word aligned. We allocate 4 bytes so that the<br />+ * data buffer stays word aligned.<br />+ * See ipath_kreceive() for more details.<br />+ */<br />+ skbs[e] =<br />+ __dev_alloc_skb(dd->ipath_ibmaxlen + 4, GFP_KERNEL);<br />+ if (skbs[e] == NULL) {<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "SKB allocation error for eager TID %u\n",<br />+ e);<br />+ while (e != 0)<br />+ dev_kfree_skb(skbs[--e]);<br />+ ret = -ENOMEM;<br />+ break;<br />+ }<br />+ skb_reserve(skbs[e], 4);<br />+ }<br />+ }<br />+ /*<br />+ * after loop above, so we can test non-NULL<br />+ * to see if ready to use at receive, etc. Hope this fixes some<br />+ * panics.<br />+ */<br />+ dd->ipath_port0_skbs = skbs;<br />+<br />+ /*<br />+ * have to tell chip each time we init it<br />+ * even if we are re-using previous memory.<br />+ */<br />+ if (!ret) {<br />+ uint64_t lenvalid; /* in words */<br />+<br />+ lenvalid = (dd->ipath_ibmaxlen - pd->port_egrskip) >> 2;<br />+ lenvalid <<= INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ lenvalid |= INFINIPATH_RT_VALID;<br />+ for (e = 0; e < egrcnt; e++) {<br />+ unsigned long pa, pent;<br />+<br />+ pa = virt_to_phys(dd->ipath_port0_skbs[e]->data);<br />+ pa += pd->port_egrskip;<br />+ if (!e && (pa & ~INFINIPATH_RT_ADDR_MASK))<br />+ _IPATH_INFO<br />+ ("phys addr %lx has more than 40 bits, using only 40!!!\n",<br />+ pa);<br />+ pent = (pa & INFINIPATH_RT_ADDR_MASK) | lenvalid;<br />+ /*<br />+ * don't need this except extreme debugging,<br />+ * but leaving to save future typing.<br />+ * _IPATH_VDBG("egr[%d] %p <- %lx\n", e, &egrbase[e], pent);<br />+ */<br />+ ipath_kput_memq(pd->port_unit, &egrbase[e], pent);<br />+ }<br />+ yield(); /* don't hog the cpu */<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * this *must* be physically contiguous memory, and for now,<br />+ * that limits it to what kmalloc can do.<br />+ */<br />+static int ipath_create_rcvhdrq(ipath_portdata * pd)<br />+{<br />+ int i, ret = 0, amt, order, pgs;<br />+ char *qt;<br />+ struct page *p;<br />+ unsigned long pa, pa0;<br />+<br />+ amt = round_up(devdata[pd->port_unit].ipath_rcvhdrcnt<br />+ * devdata[pd->port_unit].ipath_rcvhdrentsize *<br />+ sizeof(uint32_t), PAGE_SIZE);<br />+ if (!pd->port_rcvhdrq) {<br />+ order = get_order(amt);<br />+ /*<br />+ * not using REPEAT isn't viable; at 128KB, we can easily fail<br />+ * this. The problem with REPEAT is we can block here<br />+ * "forever". There isn't an inbetween, unfortunately.<br />+ * We could reduce the risk by never freeing the rcvhdrq<br />+ * except at unload, but even then, the first time a<br />+ * port is used, we could delay for some time...<br />+ */<br />+ p = alloc_pages(GFP_USER, order);<br />+ if (!p) {<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "attempt to allocate order %u memory for port %u rcvhdrq failed\n",<br />+ order, pd->port_port);<br />+ return -ENOMEM;<br />+ }<br />+<br />+ /*<br />+ * should use kmap (and later kunmap), even though high mem will<br />+ * always be mapped on x86_64, to play it safe, but for some<br />+ * bizarre reason these aren't exported symbols...<br />+ */<br />+ pd->port_rcvhdrq = page_address(p);<br />+ if (!virt_addr_valid(pd->port_rcvhdrq)) {<br />+ _IPATH_DBG<br />+ ("weird, virt_addr_valid false right after alloc_pages\n");<br />+ _IPATH_DBG("__pa(%p) is %lx, num_physpages %lx\n",<br />+ pd->port_rcvhdrq, __pa(pd->port_rcvhdrq),<br />+ num_physpages);<br />+ }<br />+ pd->port_rcvhdrq_phys = virt_to_phys(pd->port_rcvhdrq);<br />+ pd->port_rcvhdrq_order = order;<br />+<br />+ pa0 = pd->port_rcvhdrq_phys;<br />+ pgs = amt >> PAGE_SHIFT;<br />+ _IPATH_VDBG<br />+ ("%d pages at %p (phys %lx) order=%u for port %u rcvhdr Q\n",<br />+ pgs, pd->port_rcvhdrq, pa0, pd->port_rcvhdrq_order,<br />+ pd->port_port);<br />+<br />+ /*<br />+ * verify it's really physically contiguous, to be paranoid<br />+ * also mark pages as reserved, to avoid problems when<br />+ * user process with them mapped then exits.<br />+ */<br />+ qt = pd->port_rcvhdrq;<br />+ SetPageReserved(virt_to_page(qt));<br />+ qt += PAGE_SIZE;<br />+ for (pa = pa0, i = 1; i < pgs; i++, qt += PAGE_SIZE) {<br />+ SetPageReserved(virt_to_page(qt));<br />+ pa = virt_to_phys(qt);<br />+ if (pa != (pa0 + (i * PAGE_SIZE)))<br />+ _IPATH_INFO<br />+ ("pg %d at %p phys %lx not contiguous\n", i,<br />+ qt, pa);<br />+ else<br />+ _IPATH_VDBG("pg %d at %p phys %lx\n", i, qt,<br />+ pa);<br />+ }<br />+ }<br />+<br />+ /*<br />+ * clear for security, sanity, and/or debugging (each time we<br />+ * use/reuse)<br />+ */<br />+ memset(pd->port_rcvhdrq, 0, amt);<br />+<br />+ /*<br />+ * tell chip each time we init it, even if we are re-using previous<br />+ * memory (we zero it at process close)<br />+ */<br />+ _IPATH_VDBG("writing port %d rcvhdraddr as %lx\n", pd->port_port,<br />+ pd->port_rcvhdrq_phys);<br />+ ipath_kput_kreg_port(pd->port_unit, kr_rcvhdraddr, pd->port_port,<br />+ pd->port_rcvhdrq_phys);<br />+<br />+ return ret;<br />+}<br />+<br />+#ifdef _IPATH_EXTRA_DEBUG<br />+/*<br />+ * occasionally useful to dump the full set of kernel registers for debugging.<br />+ */<br />+static void ipath_dump_allregs(char *what, ipath_type t)<br />+{<br />+ uint16_t reg;<br />+ _IPATH_DBG("%s\n", what);<br />+ for (reg = 0; reg <= 0x100; reg++) {<br />+ uint64_t v = ipath_kget_kreg64(t, reg);<br />+ if (!(reg % 4))<br />+ printk("\n%3x: ", reg);<br />+ printk("%16llx ", v);<br />+ }<br />+ printk("\n");<br />+}<br />+#endif /* _IPATH_EXTRA_DEBUG */<br />+<br />+/*<br />+ * Do the actual initialization sequence on the chip. For the real<br />+ * hardware, this is done from the init routine called from the PCI<br />+ * infrastructure.<br />+ */<br />+int ipath_init_chip(const ipath_type t)<br />+{<br />+ int ret = 0, i;<br />+ uint32_t val32, kpiobufs;<br />+ uint64_t val, atmp;<br />+ volatile uint32_t *piobuf;<br />+ uint32_t pioincr;<br />+ ipath_devdata *dd = &devdata[t];<br />+ ipath_portdata *pd;<br />+ struct page *vpage;<br />+ char boardn[32];<br />+<br />+ /* first time only, set after static version info */<br />+ if (!chip_driver_version) {<br />+ i = strlen(ipath_core_version);<br />+ chip_driver_version = ipath_core_version + i;<br />+ chip_driver_size = sizeof ipath_core_version - i;<br />+ }<br />+<br />+ /*<br />+ * have to clear shadow copies of registers at init that are not<br />+ * otherwise set here, or all kinds of bizarre things happen with<br />+ * driver on chip reset<br />+ */<br />+ dd->ipath_rcvhdrsize = 0;<br />+<br />+ /*<br />+ * don't clear ipath_flags as 8bit mode was set before entering<br />+ * this func. However, we do set the linkstate to unknown<br />+ */<br />+<br />+ /* so we can watch for a transition */<br />+ dd->ipath_flags |= IPATH_LINKUNK;<br />+ dd->ipath_flags &= ~(IPATH_LINKACTIVE | IPATH_LINKARMED | IPATH_LINKDOWN<br />+ | IPATH_LINKINIT);<br />+<br />+ _IPATH_VDBG("Try to read spc chip revision\n");<br />+ dd->ipath_revision = ipath_kget_kreg64(t, kr_revision);<br />+<br />+ /*<br />+ * set up fundamental info we need to use the chip; we assume if<br />+ * the revision reg and these regs are OK, we don't need to special<br />+ * case the rest<br />+ */<br />+ dd->ipath_sregbase = ipath_kget_kreg32(t, kr_sendregbase);<br />+ dd->ipath_cregbase = ipath_kget_kreg32(t, kr_counterregbase);<br />+ dd->ipath_uregbase = ipath_kget_kreg32(t, kr_userregbase);<br />+ _IPATH_VDBG("ipath_kregbase %p, sendbase %x usrbase %x, cntrbase %x\n",<br />+ dd->ipath_kregbase, dd->ipath_sregbase, dd->ipath_uregbase,<br />+ dd->ipath_cregbase);<br />+ if ((dd->ipath_revision & 0xffffffff) == 0xffffffff ||<br />+ (dd->ipath_sregbase & 0xffffffff) == 0xffffffff ||<br />+ (dd->ipath_cregbase & 0xffffffff) == 0xffffffff ||<br />+ (dd->ipath_uregbase & 0xffffffff) == 0xffffffff) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Register read failures from chip, giving up initialization\n");<br />+ ret = -ENODEV;<br />+ goto done;<br />+ }<br />+<br />+ /* clear the initial reset flag, in case first driver load */<br />+ ipath_kput_kreg(t, kr_errorclear, INFINIPATH_E_RESET);<br />+<br />+ dd->ipath_portcnt = ipath_kget_kreg32(t, kr_portcnt);<br />+ if (!infinipath_cfgports)<br />+ dd->ipath_cfgports = dd->ipath_portcnt;<br />+ else if (infinipath_cfgports <= dd->ipath_portcnt) {<br />+ dd->ipath_cfgports = infinipath_cfgports;<br />+ _IPATH_DBG("Configured to use %u ports out of %u in chip\n",<br />+ dd->ipath_cfgports, dd->ipath_portcnt);<br />+ } else {<br />+ dd->ipath_cfgports = dd->ipath_portcnt;<br />+ _IPATH_DBG<br />+ ("Tried to configured to use %u ports; chip only supports %u\n",<br />+ infinipath_cfgports, dd->ipath_portcnt);<br />+ }<br />+ dd->ipath_pd = kmalloc(sizeof(*dd->ipath_pd) * dd->ipath_cfgports,<br />+ GFP_KERNEL);<br />+ if (!dd->ipath_pd) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Unable to allocate portdata array, failing\n");<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+ memset(dd->ipath_pd, 0, sizeof(*dd->ipath_pd) * dd->ipath_cfgports);<br />+<br />+ dd->ipath_lastegrheads = kmalloc(sizeof(*dd->ipath_lastegrheads)<br />+ * dd->ipath_cfgports, GFP_KERNEL);<br />+ dd->ipath_lastrcvhdrqtails = kmalloc(sizeof(*dd->ipath_lastrcvhdrqtails)<br />+ * dd->ipath_cfgports, GFP_KERNEL);<br />+ if (!dd->ipath_lastegrheads || !dd->ipath_lastrcvhdrqtails) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Unable to allocate head arrays, failing\n");<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+ memset(dd->ipath_lastrcvhdrqtails, 0,<br />+ sizeof(*dd->ipath_lastrcvhdrqtails)<br />+ * dd->ipath_cfgports);<br />+ memset(dd->ipath_lastegrheads, 0, sizeof(*dd->ipath_lastegrheads)<br />+ * dd->ipath_cfgports);<br />+<br />+ dd->ipath_pd[0] = kmalloc(sizeof(ipath_portdata), GFP_KERNEL);<br />+ if (!dd->ipath_pd[0]) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Unable to allocate portdata for port 0, failing\n");<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+ memset(dd->ipath_pd[0], 0, sizeof(ipath_portdata));<br />+<br />+ pd = dd->ipath_pd[0];<br />+ pd->port_unit = t;<br />+ pd->port_port = 0;<br />+ pd->port_cnt = 1;<br />+ /* The port 0 pkey table is used by the layer interface. */<br />+ pd->port_pkeys[0] = IPS_DEFAULT_P_KEY;<br />+<br />+ dd->ipath_rcvtidcnt = ipath_kget_kreg32(t, kr_rcvtidcnt);<br />+ dd->ipath_rcvtidbase = ipath_kget_kreg32(t, kr_rcvtidbase);<br />+ dd->ipath_rcvegrcnt = ipath_kget_kreg32(t, kr_rcvegrcnt);<br />+ dd->ipath_rcvegrbase = ipath_kget_kreg32(t, kr_rcvegrbase);<br />+ dd->ipath_palign = ipath_kget_kreg32(t, kr_pagealign);<br />+ dd->ipath_piobufbase = ipath_kget_kreg32(t, kr_sendpiobufbase);<br />+ dd->ipath_piosize = ipath_kget_kreg32(t, kr_sendpiosize);<br />+ dd->ipath_ibmtu = 4096; /* default to largest legal MTU */<br />+ dd->ipath_piobcnt = ipath_kget_kreg32(t, kr_sendpiobufcnt);<br />+<br />+ _IPATH_VDBG<br />+ ("Revision %llx (PCI %x), %u ports, %u tids, %u egrtids, %u piobufs\n",<br />+ dd->ipath_revision, dd->ipath_pcirev, dd->ipath_portcnt,<br />+ dd->ipath_rcvtidcnt, dd->ipath_rcvegrcnt, dd->ipath_piobcnt);<br />+<br />+ if (((dd->ipath_revision >> INFINIPATH_R_SOFTWARE_SHIFT) & INFINIPATH_R_SOFTWARE_MASK) != IPATH_CHIP_SWVERSION) { /* >= maybe, someday */<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Driver only handles version %d, chip swversion is %d (%llx), failng\n",<br />+ IPATH_CHIP_SWVERSION,<br />+ (int)(dd-><br />+ ipath_revision >><br />+ INFINIPATH_R_SOFTWARE_SHIFT) &<br />+ INFINIPATH_R_SOFTWARE_MASK,<br />+ dd->ipath_revision);<br />+ ret = -ENOSYS;<br />+ goto done;<br />+ }<br />+ dd->ipath_majrev = (uint8_t) ((dd->ipath_revision >><br />+ INFINIPATH_R_CHIPREVMAJOR_SHIFT) &<br />+ INFINIPATH_R_CHIPREVMAJOR_MASK);<br />+ dd->ipath_minrev =<br />+ (uint8_t) ((dd-><br />+ ipath_revision >> INFINIPATH_R_CHIPREVMINOR_SHIFT) &<br />+ INFINIPATH_R_CHIPREVMINOR_MASK);<br />+ dd->ipath_boardrev =<br />+ (uint8_t) ((dd-><br />+ ipath_revision >> INFINIPATH_R_BOARDID_SHIFT) &<br />+ INFINIPATH_R_BOARDID_MASK);<br />+<br />+ ipath_get_boardname(t, boardn, sizeof boardn);<br />+<br />+ {<br />+ snprintf(chip_driver_version, chip_driver_size,<br />+ "Driver %u.%u, %s, InfiniPath%u %u.%u, PCI %u, SW Compat %u\n",<br />+ IPATH_CHIP_VERS_MAJ, IPATH_CHIP_VERS_MIN, boardn,<br />+ (unsigned)(dd-><br />+ ipath_revision >> INFINIPATH_R_ARCH_SHIFT) &<br />+ INFINIPATH_R_ARCH_MASK, dd->ipath_majrev,<br />+ dd->ipath_minrev, dd->ipath_pcirev,<br />+ (unsigned)(dd-><br />+ ipath_revision >><br />+ INFINIPATH_R_SOFTWARE_SHIFT) &<br />+ INFINIPATH_R_SOFTWARE_MASK);<br />+<br />+ }<br />+<br />+ _IPATH_DBG("%s", chip_driver_version);<br />+<br />+ /*<br />+ * we ignore most issues after reporting them, but have to specially<br />+ * handle hardware-disabled chips.<br />+ */<br />+ if(ipath_validate_rev(dd) == 2) {<br />+ ret = -EPERM; /* unique error, known to infinipath_init_one() */<br />+ goto done;<br />+ }<br />+<br />+ /*<br />+ * zero all the TID entries at startup. We do this for sanity,<br />+ * in case of a previous driver crash of some kind, and also<br />+ * because the chip powers up with these memories in an unknown<br />+ * state. Use portcnt, not cfgports, since this is for the full chip,<br />+ * not for current (possibly different) configuration value<br />+ * Chip Errata bug 6447<br />+ */<br />+ for (val32 = 0; val32 < dd->ipath_portcnt; val32++)<br />+ ipath_clear_tids(t, val32);<br />+<br />+ dd->ipath_rcvhdrentsize = IPATH_RCVHDRENTSIZE;<br />+ /* we could bump this<br />+ * to allow for full rcvegrcnt + rcvtidcnt, but then it no<br />+ * longer nicely fits power of two, and since we now use<br />+ * alloc_pages, the rest would be wasted.<br />+ */<br />+ dd->ipath_rcvhdrcnt = dd->ipath_rcvegrcnt;<br />+ /*<br />+ * setup offset of last valid entry in rcvhdrq, for various tests, to<br />+ * avoid calculating each time we need it<br />+ */<br />+ dd->ipath_hdrqlast =<br />+ dd->ipath_rcvhdrentsize * (dd->ipath_rcvhdrcnt - 1);<br />+ ipath_kput_kreg(t, kr_rcvhdrentsize, dd->ipath_rcvhdrentsize);<br />+ ipath_kput_kreg(t, kr_rcvhdrcnt, dd->ipath_rcvhdrcnt);<br />+ /*<br />+ * not in ipath_rcvhdrsize, so user programs can set differently, but<br />+ * so any early packets see the default size.<br />+ */<br />+ ipath_kput_kreg(t, kr_rcvhdrsize, IPATH_DFLT_RCVHDRSIZE);<br />+<br />+ /*<br />+ * we "know" that this works<br />+ * out OK. It's actually a bit more than we need, but 2048+64 isn't<br />+ * quite enough for full size, and we want the +N to be a power of 2<br />+ * to give us reasonable alignment and fit within page_alloc()'ed<br />+ * memory<br />+ */<br />+ dd->ipath_rcvegrbufsize = dd->ipath_piosize;<br />+<br />+ /*<br />+ * the min() check here is currently a nop, but it may not always be,<br />+ * depending on just how we do ipath_rcvegrbufsize<br />+ */<br />+ dd->ipath_ibmaxlen = min(dd->ipath_piosize, dd->ipath_rcvegrbufsize);<br />+ dd->ipath_init_ibmaxlen = dd->ipath_ibmaxlen;<br />+<br />+ /*<br />+ * set up the shadow copies of the piobufavail registers, which<br />+ * we compare against the chip registers for now, and the in<br />+ * memory DMA'ed copies of the registers. This has to be done<br />+ * early, before we calculate lastport, etc.<br />+ */<br />+ val = dd->ipath_piobcnt;<br />+ /*<br />+ * calc number of pioavail registers, and save it; we have 2 bits<br />+ * per buffer<br />+ */<br />+ dd->ipath_pioavregs =<br />+ round_up(val, sizeof(uint64_t) * _BITS_PER_BYTE / 2) /<br />+ (sizeof(uint64_t) * _BITS_PER_BYTE / 2);<br />+ if (dd->ipath_pioavregs ><br />+ (sizeof(dd->ipath_pioavailshadow) /<br />+ sizeof(dd->ipath_pioavailshadow[0]))) {<br />+ dd->ipath_pioavregs =<br />+ sizeof(dd->ipath_pioavailshadow) /<br />+ sizeof(dd->ipath_pioavailshadow[0]);<br />+ dd->ipath_piobcnt = dd->ipath_pioavregs * sizeof(uint64_t) * _BITS_PER_BYTE >> 1; /* 2 bits/reg */<br />+ _IPATH_INFO<br />+ ("Warning: %lld piobufs is too many to fit in shadow, only using %d\n",<br />+ val, dd->ipath_piobcnt);<br />+ }<br />+<br />+ if (!infinipath_kpiobufs) {<br />+ /* have to have at least one, for SMA */<br />+ kpiobufs = infinipath_kpiobufs = 1;<br />+ } else if (dd->ipath_piobcnt <<br />+ (dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT)) {<br />+ _IPATH_INFO<br />+ ("Too few PIO buffers (%u) for %u ports to have %u each!\n",<br />+ dd->ipath_piobcnt, dd->ipath_cfgports,<br />+ IPATH_MIN_USER_PORT_BUFCNT);<br />+ kpiobufs = 1; /* reserve just the minimum for SMA/ether */<br />+ } else<br />+ kpiobufs = infinipath_kpiobufs;<br />+<br />+ if (kpiobufs ><br />+ (dd->ipath_piobcnt -<br />+ (dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT))) {<br />+ i = dd->ipath_piobcnt -<br />+ (dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT);<br />+ if (i < 0)<br />+ i = 0;<br />+ _IPATH_INFO<br />+ ("Allocating %d PIO bufs for kernel leaves too few for %d user ports (%d each); using %u\n",<br />+ kpiobufs, dd->ipath_cfgports - 1,<br />+ IPATH_MIN_USER_PORT_BUFCNT, i);<br />+ /*<br />+ * shouldn't change infinipath_kpiobufs, because could be<br />+ * different for different devices...<br />+ */<br />+ kpiobufs = i;<br />+ }<br />+ dd->ipath_lastport_piobuf = dd->ipath_piobcnt - kpiobufs;<br />+ dd->ipath_pbufsport = dd->ipath_cfgports > 1 ?<br />+ dd->ipath_lastport_piobuf / (dd->ipath_cfgports - 1) : 0;<br />+ val32 = dd->ipath_lastport_piobuf -<br />+ (dd->ipath_pbufsport * (dd->ipath_cfgports - 1));<br />+ if (val32 > 0) {<br />+ _IPATH_DBG<br />+ ("allocating %u pbufs/port leaves %u unused, add to kernel\n",<br />+ dd->ipath_pbufsport, val32);<br />+ dd->ipath_lastport_piobuf -= val32;<br />+ _IPATH_DBG("%u pbufs/port leaves %u unused, add to kernel\n",<br />+ dd->ipath_pbufsport, val32);<br />+ }<br />+ dd->ipath_lastpioindex = dd->ipath_lastport_piobuf;<br />+ _IPATH_VDBG<br />+ ("%d PIO bufs %u - %u, %u each for %u user ports\n",<br />+ kpiobufs, dd->ipath_lastport_piobuf, dd->ipath_piobcnt, dd->ipath_pbufsport,<br />+ dd->ipath_cfgports - 1);<br />+<br />+ /*<br />+ * this has to be page aligned, and on a page of it's own, so we<br />+ * can map it into user space. We also use it to give processes<br />+ * a copy of ipath_statusp, on a separate cacheline, followed by<br />+ * a copy of the freeze error string, if it's happened. Might also<br />+ * use that space for other things.<br />+ */<br />+ val = round_up(2 * L1_CACHE_BYTES + sizeof(*dd->ipath_statusp) +<br />+ dd->ipath_pioavregs * sizeof(uint64_t), 2 * PAGE_SIZE);<br />+ if (!(dd->ipath_pioavailregs_dma = kmalloc(val * sizeof(uint64_t),<br />+ GFP_KERNEL))) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "failed to allocate PIOavail reg area in memory\n");<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+ if ((PAGE_SIZE - 1) & (uint64_t) dd->ipath_pioavailregs_dma) {<br />+ dd->__ipath_pioavailregs_base = dd->ipath_pioavailregs_dma;<br />+ dd->ipath_pioavailregs_dma = (uint64_t *)<br />+ round_up((uint64_t) dd->ipath_pioavailregs_dma, PAGE_SIZE);<br />+ } else<br />+ dd->__ipath_pioavailregs_base = dd->ipath_pioavailregs_dma;<br />+ /*<br />+ * zero initial, since whole thing mapped<br />+ * into user space, and don't want info leak, or confusing garbage<br />+ */<br />+ memset((void *)dd->ipath_pioavailregs_dma, 0, PAGE_SIZE);<br />+<br />+ /*<br />+ * we really want L2 cache aligned, but for current CPUs of interest,<br />+ * they are the same.<br />+ */<br />+ dd->ipath_statusp = (uint64_t *) ((char *)dd->ipath_pioavailregs_dma +<br />+ ((2 * L1_CACHE_BYTES +<br />+ dd->ipath_pioavregs *<br />+ sizeof(uint64_t)) &<br />+ ~L1_CACHE_BYTES));<br />+ /* copy the current value now that it's really allocated */<br />+ *dd->ipath_statusp = dd->_ipath_status;<br />+ /*<br />+ * setup buffer to hold freeze msg, accessible to apps, following<br />+ * statusp<br />+ */<br />+ dd->ipath_freezemsg = (char *)&dd->ipath_statusp[1];<br />+ /* and it's length */<br />+ dd->ipath_freezelen = L1_CACHE_BYTES - sizeof(dd->ipath_statusp[0]);<br />+<br />+ atmp = virt_to_phys(dd->ipath_pioavailregs_dma);<br />+ /* stash physical address for user progs */<br />+ dd->ipath_pioavailregs_phys = atmp;<br />+ (void)ipath_kput_kreg(t, kr_sendpioavailaddr, atmp);<br />+ /*<br />+ * this is to detect s/w errors, which the h/w works around by<br />+ * ignoring the low 6 bits of address, if it wasn't aligned.<br />+ */<br />+ val = ipath_kget_kreg64(t, kr_sendpioavailaddr);<br />+ if (val != atmp) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Catastrophic software error, SendPIOAvailAddr written as %llx, read back as %llx\n",<br />+ atmp, val);<br />+ ret = -EINVAL;<br />+ goto done;<br />+ }<br />+<br />+ if (t * 64 > (sizeof(ipath_port0_rcvhdrtail) - 64)) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "unit %u too large for port 0 rcvhdrtail buffer size\n",<br />+ t);<br />+ ret = -ENODEV;<br />+ }<br />+<br />+ /*<br />+ * kernel modules loaded into vmalloc'ed memory,<br />+ * verify that when we assume that, map to phys, and back to virt,<br />+ * that we get the right contents, so we did the mapping right.<br />+ */<br />+ vpage = vmalloc_to_page((void *)ipath_port0_rcvhdrtail);<br />+ if (vpage == NOPAGE_SIGBUS || vpage == NOPAGE_OOM) {<br />+ _IPATH_UNIT_ERROR(t, "vmalloc_to_page for rcvhdrtail fails!\n");<br />+ ret = -ENOMEM;<br />+ goto done;<br />+ }<br />+<br />+ /*<br />+ * 64 is driven by cache line size, and also by chip requirement<br />+ * that low 6 bits be 0<br />+ */<br />+ val = page_to_phys(vpage) + t * 64;<br />+<br />+ /* verify that the alignment requirement was met */<br />+ ipath_kput_kreg_port(t, kr_rcvhdrtailaddr, 0, val);<br />+ atmp = ipath_kget_kreg64_port(t, kr_rcvhdrtailaddr, 0);<br />+ if (val != atmp) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Catastrophic software error, RcvHdrTailAddr0 written as %llx, read back as %llx from %x\n",<br />+ val, atmp, kr_rcvhdrtailaddr);<br />+ ret = -EINVAL;<br />+ goto done;<br />+ }<br />+ /* so we can get current tail in ipath_kreceive(), per chip */<br />+ dd->ipath_hdrqtailptr =<br />+ &ipath_port0_rcvhdrtail[t *<br />+ (64 / sizeof(ipath_port0_rcvhdrtail[0]))];<br />+<br />+ ipath_kput_kreg(t, kr_rcvbthqp, IPATH_KD_QP);<br />+<br />+ /*<br />+ * make sure we are not in freeze, and PIO send enabled, so<br />+ * writes to pbc happen<br />+ */<br />+ ipath_kput_kreg(t, kr_hwerrmask, 0ULL);<br />+ ipath_kput_kreg(t, kr_hwerrclear, ~0ULL);<br />+ ipath_kput_kreg(t, kr_control, 0ULL);<br />+ ipath_kput_kreg(t, kr_sendctrl, INFINIPATH_S_PIOENABLE);<br />+<br />+ /*<br />+ * write the pbc of each buffer, to be sure it's initialized, then<br />+ * cancel all the buffers, and also abort any packets that might<br />+ * have been in flight for some reason (the latter is for driver<br />+ * unload/reload, but isn't a bad idea at first init).<br />+ * PIO send isn't enabled at this point, so there is no danger<br />+ * of sending these out on the wire.<br />+ * Chip Errata bug 6610<br />+ */<br />+ piobuf = (uint32_t *) (((char *)(dd->ipath_kregbase)) +<br />+ dd->ipath_piobufbase);<br />+ pioincr = devdata[t].ipath_palign / sizeof(*piobuf);<br />+ for (i = 0; i < dd->ipath_piobcnt; i++) {<br />+ *piobuf = 16; /* reasonable word count, just to init pbc */<br />+ piobuf += pioincr;<br />+ }<br />+ /* self-clearing */<br />+ ipath_kput_kreg(t, kr_sendctrl, INFINIPATH_S_ABORT);<br />+<br />+ /*<br />+ * before error clears, since we expect serdes pll errors during<br />+ * this, the first time after reset<br />+ */<br />+ if (ipath_bringup_link(t)) {<br />+ _IPATH_INFO("Failed to bringup IB link\n");<br />+ ret = -ENETDOWN;<br />+ goto done;<br />+ }<br />+<br />+ /*<br />+ * clear any "expected" hwerrs from reset and/or initialization<br />+ * clear any that aren't enabled (at least this once), and then<br />+ * set the enable mask<br />+ */<br />+ ipath_clear_init_hwerrs(t);<br />+ ipath_kput_kreg(t, kr_hwerrclear, ~0ULL);<br />+ ipath_kput_kreg(t, kr_hwerrmask, dd->ipath_hwerrmask);<br />+<br />+ dd->ipath_maskederrs = dd->ipath_ignorederrs;<br />+ ipath_kput_kreg(t, kr_errorclear, ~0ULL); /* clear all */<br />+ /* enable errors that are masked, at least this first time. */<br />+ ipath_kput_kreg(t, kr_errormask, ~dd->ipath_maskederrs);<br />+ /* clear any interrups up to this point (ints still not enabled) */<br />+ ipath_kput_kreg(t, kr_intclear, ~0ULL);<br />+<br />+ ipath_stats.sps_lid[t] = dd->ipath_lid;<br />+<br />+ /*<br />+ * allocate the shadow TID array, so we can ipath_munlock<br />+ * previous entries. It make make more sense to move the pageshadow<br />+ * to the port data structure, so we only allocate memory for ports<br />+ * actually in use, since we at 8k per port, now<br />+ */<br />+ dd->ipath_pageshadow = (struct page **)<br />+ vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt *<br />+ sizeof(struct page *));<br />+ if (!dd->ipath_pageshadow)<br />+ _IPATH_UNIT_ERROR(t,<br />+ "failed to allocate shadow page * array, no expected sends!\n");<br />+ else<br />+ memset(dd->ipath_pageshadow, 0,<br />+ dd->ipath_cfgports * dd->ipath_rcvtidcnt *<br />+ sizeof(struct page *));<br />+<br />+ /* set up the port 0 (kernel) rcvhdr q and egr TIDs */<br />+ if (!(ret = ipath_create_rcvhdrq(dd->ipath_pd[0])))<br />+ ret = ipath_create_port0_egr(dd->ipath_pd[0]);<br />+ if (ret)<br />+ _IPATH_UNIT_ERROR(t,<br />+ "failed to allocate port 0 (kernel) rcvhdrq and/or egr bufs\n");<br />+ else {<br />+ init_waitqueue_head(&ipath_sma_wait);<br />+ init_waitqueue_head(&ipath_sma_state_wait);<br />+<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvctrl, dd->ipath_rcvctrl);<br />+<br />+ ipath_kput_kreg(t, kr_rcvbthqp, IPATH_KD_QP);<br />+<br />+ /* Enable PIO send, and update of PIOavail regs to memory. */<br />+ dd->ipath_sendctrl = INFINIPATH_S_PIOENABLE<br />+ | INFINIPATH_S_PIOBUFAVAILUPD;<br />+ ipath_kput_kreg(t, kr_sendctrl, dd->ipath_sendctrl);<br />+<br />+ /*<br />+ * enable port 0 receive, and receive interrupt<br />+ * other ports done as user opens and inits them<br />+ */<br />+ dd->ipath_rcvctrl = INFINIPATH_R_TAILUPD |<br />+ (1ULL << INFINIPATH_R_PORTENABLE_SHIFT) |<br />+ (1ULL << INFINIPATH_R_INTRAVAIL_SHIFT);<br />+ ipath_kput_kreg(t, kr_rcvctrl, dd->ipath_rcvctrl);<br />+<br />+ /*<br />+ * now ready for use<br />+ * this should be cleared whenever we detect a reset, or<br />+ * initiate one.<br />+ */<br />+ dd->ipath_flags |= IPATH_INITTED;<br />+<br />+ /*<br />+ * init our shadow copies of head from tail values, and write<br />+ * head values to match<br />+ */<br />+ val32 = ipath_kget_ureg32(t, ur_rcvegrindextail, 0);<br />+ (void)ipath_kput_ureg(t, ur_rcvegrindexhead, val32, 0);<br />+ dd->ipath_port0head = ipath_kget_ureg32(t, ur_rcvhdrtail, 0);<br />+ (void)ipath_kput_ureg(t, ur_rcvhdrhead, dd->ipath_port0head, 0);<br />+<br />+ /*<br />+ * by now pioavail updates to memory should have occurred,<br />+ * so copy them into our working/shadow registers; this is<br />+ * in case something went wrong with abort, but mostly to<br />+ * get the initial values of the generation bit correct<br />+ */<br />+ for (i = 0; i < dd->ipath_pioavregs; i++) {<br />+ /*<br />+ * Chip Errata bug 6641; even and odd qwords>3<br />+ * are swapped<br />+ */<br />+ if (i > 3) {<br />+ if (i & 1)<br />+ dd->ipath_pioavailshadow[i] =<br />+ dd->ipath_pioavailregs_dma[i - 1];<br />+ else<br />+ dd->ipath_pioavailshadow[i] =<br />+ dd->ipath_pioavailregs_dma[i + 1];<br />+ } else<br />+ dd->ipath_pioavailshadow[i] =<br />+ dd->ipath_pioavailregs_dma[i];<br />+ }<br />+ /* can get counters, stats, etc. */<br />+ dd->ipath_flags |= IPATH_PRESENT;<br />+ }<br />+<br />+ /*<br />+ * cause retrigger of pending interrupts ignored during init, even if<br />+ * we had errors<br />+ */<br />+ ipath_kput_kreg(t, kr_intclear, 0ULL);<br />+<br />+ /*<br />+ * set up stats retrieval timer, even if we had errors in last<br />+ * portion of setup<br />+ */<br />+ init_timer(&dd->ipath_stats_timer);<br />+ dd->ipath_stats_timer.function = ipath_get_faststats;<br />+ dd->ipath_stats_timer.data = (unsigned long)t;<br />+ /* every 5 seconds; */<br />+ dd->ipath_stats_timer.expires = jiffies + 5 * HZ;<br />+ /* takes ~16 seconds to overflow at full IB 4x bandwdith */<br />+ add_timer(&dd->ipath_stats_timer);<br />+<br />+ dd->ipath_stats_timer_active = 1;<br />+<br />+done:<br />+ if (!ret) {<br />+ ipath_get_guid(t);<br />+ *dd->ipath_statusp |= IPATH_STATUS_CHIP_PRESENT;<br />+ if (!ipath_sma_data_spare) {<br />+ /* first init, setup SMA data structs */<br />+ ipath_sma_data_spare =<br />+ ipath_sma_data_bufs[IPATH_NUM_SMAPKTS];<br />+ for (i = 0; i < IPATH_NUM_SMAPKTS; i++)<br />+ ipath_sma_data[i].buf = ipath_sma_data_bufs[i];<br />+ }<br />+ /*<br />+ * sps_nports is a global, so, we set it to the highest<br />+ * number of ports of any of the chips we find; we never<br />+ * decrement it, at least for now.<br />+ */<br />+ if (dd->ipath_cfgports > ipath_stats.sps_nports)<br />+ ipath_stats.sps_nports = dd->ipath_cfgports;<br />+ }<br />+ /* if ret is non-zero, we probably should do some cleanup here... */<br />+ return ret;<br />+}<br />+<br />+int ipath_waitfor_complete(const ipath_type t, ipath_kreg reg_id,<br />+ uint64_t bits_to_wait_for, uint64_t * valp)<br />+{<br />+ uint64_t timeout, lastval, val;<br />+<br />+ lastval = ipath_kget_kreg64(t, reg_id);<br />+ timeout = get_cycles() + 0x10000000ULL; /* <- ridiculously long time */<br />+ do {<br />+ val = ipath_kget_kreg64(t, reg_id);<br />+ *valp = val; /* so they have something, even on failures. */<br />+ if ((val & bits_to_wait_for) == bits_to_wait_for)<br />+ return 0;<br />+ if (val != lastval)<br />+ _IPATH_VDBG<br />+ ("Changed from %llx to %llx, waiting for %llx bits\n",<br />+ lastval, val, bits_to_wait_for);<br />+ yield();<br />+ if (get_cycles() > timeout) {<br />+ _IPATH_DBG<br />+ ("Didn't get bits %llx in register 0x%x, got %llx\n",<br />+ bits_to_wait_for, reg_id, *valp);<br />+ return ENODEV;<br />+ }<br />+ } while (1);<br />+}<br />+<br />+/*<br />+ * like ipath_waitfor_complete(), but we wait for the CMDVALID bit to go away<br />+ * indicating the last command has completed. It doesn't return data<br />+ */<br />+int ipath_waitfor_mdio_cmdready(const ipath_type t)<br />+{<br />+ uint64_t timeout;<br />+ uint64_t val;<br />+<br />+ timeout = get_cycles() + 0x10000000ULL; /* <- ridiculously long time */<br />+ do {<br />+ val = ipath_kget_kreg64(t, kr_mdio);<br />+ if (!(val & IPATH_MDIO_CMDVALID))<br />+ return 0;<br />+ yield();<br />+ if (get_cycles() > timeout) {<br />+ _IPATH_DBG("CMDVALID stuck in mdio reg? (%llx)\n", val);<br />+ return ENODEV;<br />+ }<br />+ } while (1);<br />+}<br />+<br />+void ipath_set_ib_lstate(const ipath_type t, int which)<br />+{<br />+ ipath_devdata *dd = &devdata[t];<br />+ char *what;<br />+<br />+ /*<br />+ * For all cases, we'll either be setting a new value of linkcmd, or<br />+ * we want it to be NOP, so clear it here.<br />+ * Similarly, we want the linkinitcmd to be NOP for everything<br />+ * other than explictly than explictly changing linkinitcmd,<br />+ * and for that case, we want to first clear any existing bits<br />+ */<br />+ dd->ipath_ibcctrl &= ~((INFINIPATH_IBCC_LINKCMD_MASK <<<br />+ INFINIPATH_IBCC_LINKCMD_SHIFT) |<br />+ (INFINIPATH_IBCC_LINKINITCMD_MASK <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT));<br />+<br />+ if (which == INFINIPATH_IBCC_LINKCMD_INIT) {<br />+ dd->ipath_flags &= ~(IPATH_LINK_TOARMED | IPATH_LINK_TOACTIVE<br />+ | IPATH_LINK_SLEEPING);<br />+ /* so we can watch for a transition */<br />+ dd->ipath_flags |= IPATH_LINKDOWN;<br />+ what = "INIT";<br />+ } else if (which == INFINIPATH_IBCC_LINKCMD_ARMED) {<br />+ dd->ipath_flags |= IPATH_LINK_TOARMED;<br />+ dd->ipath_flags &= ~(IPATH_LINK_TOACTIVE | IPATH_LINK_SLEEPING);<br />+ /*<br />+ * this is mainly for loopback testing. If INITCMD is<br />+ * NOP or SLEEP, the link won't ever come up in loopback...<br />+ */<br />+ if (!<br />+ (dd-><br />+ ipath_flags & (IPATH_LINKINIT | IPATH_LINKARMED |<br />+ IPATH_LINKACTIVE))) {<br />+ _IPATH_SMADBG<br />+ ("going to armed, but link not yet up, set POLL\n");<br />+ dd->ipath_ibcctrl |=<br />+ INFINIPATH_IBCC_LINKINITCMD_POLL <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT;<br />+ }<br />+ what = "ARMED";<br />+ } else if (which == INFINIPATH_IBCC_LINKCMD_ACTIVE) {<br />+ dd->ipath_flags |= IPATH_LINK_TOACTIVE;<br />+ dd->ipath_flags &= ~(IPATH_LINK_TOARMED | IPATH_LINK_SLEEPING);<br />+ what = "ACTIVE";<br />+ } else if (which & (INFINIPATH_IBCC_LINKINITCMD_MASK << INFINIPATH_IBCC_LINKINITCMD_SHIFT)) { /* down, disable, etc. */<br />+ dd->ipath_flags &= ~(IPATH_LINK_TOARMED | IPATH_LINK_TOACTIVE);<br />+ if (((which & INFINIPATH_IBCC_LINKINITCMD_MASK) >><br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT) ==<br />+ INFINIPATH_IBCC_LINKINITCMD_SLEEP) {<br />+ dd->ipath_flags |= IPATH_LINK_SLEEPING | IPATH_LINKDOWN;<br />+ } else<br />+ dd->ipath_flags |= IPATH_LINKDOWN;<br />+ dd->ipath_ibcctrl |=<br />+ which & (INFINIPATH_IBCC_LINKINITCMD_MASK <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT);<br />+ what = "DOWN";<br />+ } else {<br />+ what = "UNKNOWN";<br />+ _IPATH_INFO("Unknown link transition requested (which=0x%x)\n",<br />+ which);<br />+ }<br />+<br />+ dd->ipath_ibcctrl |= ((uint64_t) which & INFINIPATH_IBCC_LINKCMD_MASK)<br />+ << INFINIPATH_IBCC_LINKCMD_SHIFT;<br />+<br />+ _IPATH_SMADBG("Trying to move unit %u to %s, current ltstate is %s\n",<br />+ t, what, ipath_ibcstatus_str[(ipath_kget_kreg64(t, kr_ibcstatus)<br />+ >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT)<br />+ & INFINIPATH_IBCS_LINKTRAININGSTATE_MASK]);<br />+ ipath_kput_kreg(t, kr_ibcctrl, dd->ipath_ibcctrl);<br />+}<br />+<br />+static int ipath_bringup_link(const ipath_type t)<br />+{<br />+ ipath_devdata *dd = &devdata[t];<br />+ uint64_t val, ibc;<br />+ int ret = 0;<br />+<br />+ dd->ipath_control &= ~INFINIPATH_C_LINKENABLE; /* hold IBC in reset */<br />+ ipath_kput_kreg(t, kr_control, dd->ipath_control);<br />+<br />+ /*<br />+ * Note that prior to try 14 or 15 of IB, the credit scaling<br />+ * wasn't working, because it was swapped for writes with the<br />+ * 1 bit default linkstate field<br />+ */<br />+<br />+ /* ignore pbc and align word */<br />+ val = dd->ipath_piosize - 2 * sizeof(uint32_t);<br />+ /*<br />+ * for ICRC, which we only send in diag test pkt mode, and we don't<br />+ * need to worry about that for mtu<br />+ */<br />+ val += 1;<br />+ /*<br />+ * set the IBC maxpktlength to the size of our pio buffers<br />+ * the maxpktlength is in words. This is *not* the IB data MTU<br />+ */<br />+ ibc = (val / sizeof(uint32_t)) << INFINIPATH_IBCC_MAXPKTLEN_SHIFT;<br />+ /* in KB */<br />+ ibc |= 0x5ULL << INFINIPATH_IBCC_FLOWCTRLWATERMARK_SHIFT;<br />+ /* how often flowctrl sent<br />+ * more or less in usecs; balance against watermark value, so that<br />+ * in theory senders always get a flow control update in time to not<br />+ * let the IB link go idle.<br />+ */<br />+ ibc |= 0x3ULL << INFINIPATH_IBCC_FLOWCTRLPERIOD_SHIFT;<br />+ /* max error tolerance */<br />+ ibc |= 0xfULL << INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT;<br />+ /* use "real" buffer space for */<br />+ ibc |= 4ULL << INFINIPATH_IBCC_CREDITSCALE_SHIFT;<br />+ /* IB credit flow control. */<br />+ ibc |= 0xfULL << INFINIPATH_IBCC_OVERRUNTHRESHOLD_SHIFT;<br />+ /* initially come up waiting for TS1, without sending anything. */<br />+ dd->ipath_ibcctrl = ibc;<br />+ /* don't put linkinitcmd in ipath_ibcctrl, want that to stay a NOP */<br />+ ibc |=<br />+ INFINIPATH_IBCC_LINKINITCMD_SLEEP <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT;<br />+ dd->ipath_flags |= IPATH_LINK_SLEEPING;<br />+ ipath_kput_kreg(t, kr_ibcctrl, ibc);<br />+<br />+ ret = ipath_bringup_serdes(t);<br />+<br />+ if (ret)<br />+ _IPATH_INFO("Could not initialize SerDes, not usable\n");<br />+ else {<br />+ dd->ipath_control |= INFINIPATH_C_LINKENABLE; /* enable IBC */<br />+ ipath_kput_kreg(t, kr_control, dd->ipath_control);<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * called from ipath_shutdown_link(), and from sma doing a LINKDOWN<br />+ * Left as a separate function for historical reasons, and may want<br />+ * it to do more than just call ipath_set_ib_lstate() again sometime<br />+ * in the future.<br />+ */<br />+void ipath_down_link(const ipath_type t)<br />+{<br />+ ipath_set_ib_lstate(t, INFINIPATH_IBCC_LINKINITCMD_SLEEP <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT);<br />+}<br />+<br />+/*<br />+ * do this when driver is being unloaded, or perhaps for diags, and<br />+ * maybe when we get an interrupt of a fatal link error that requires<br />+ * bringing the linkd down and back up<br />+ */<br />+static int ipath_shutdown_link(const ipath_type t)<br />+{<br />+ uint64_t val;<br />+ ipath_devdata *dd = &devdata[t];<br />+ int ret = 0;<br />+<br />+ _IPATH_DBG("Shutting down the link\n");<br />+ ipath_down_link(t);<br />+<br />+ /*<br />+ * we are shutting down, so tell the layered driver. We don't<br />+ * do this on just a link state change, much like ethernet, <br />+ * a cable unplug, etc. doesn't change driver state<br />+ */<br />+ if (dd->ipath_layer.l_intr)<br />+ dd->ipath_layer.l_intr(t, IPATH_LAYER_INT_IF_DOWN);<br />+<br />+ dd->ipath_control &= ~INFINIPATH_C_LINKENABLE; /* disable IBC */<br />+ ipath_kput_kreg(t, kr_control, dd->ipath_control);<br />+<br />+ *dd->ipath_statusp &= ~(IPATH_STATUS_IB_CONF | IPATH_STATUS_IB_READY);<br />+<br />+ /*<br />+ * clear SerdesEnable and turn the leds off; do this here because<br />+ * we are unloading, so don't count on interrupts to move along<br />+ */<br />+<br />+ ipath_quiet_serdes(t);<br />+ val = dd->ipath_extctrl &<br />+ ~(INFINIPATH_EXTC_LEDPRIPORTGREENON |<br />+ INFINIPATH_EXTC_LEDPRIPORTYELLOWON);<br />+ dd->ipath_extctrl = val;<br />+ ipath_kput_kreg(t, kr_extctrl, val);<br />+<br />+ if (dd->ipath_stats_timer_active) {<br />+ del_timer_sync(&dd->ipath_stats_timer);<br />+ dd->ipath_stats_timer_active = 0;<br />+ }<br />+ if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) {<br />+ /* can't do anything more with chip */<br />+ /* needs re-init */<br />+ *dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT;<br />+ if (dd->ipath_kregbase) {<br />+ /*<br />+ * if we haven't already cleaned up before these<br />+ * are to ensure any register reads/writes "fail"<br />+ * until re-init<br />+ */<br />+ dd->ipath_kregbase = NULL;<br />+ dd->ipath_kregvirt = NULL;<br />+ dd->ipath_uregbase = 0ULL;<br />+ dd->ipath_sregbase = 0ULL;<br />+ dd->ipath_cregbase = 0ULL;<br />+ dd->ipath_kregsize = 0;<br />+ }<br />+#ifdef CONFIG_MTRR<br />+ if (dd->ipath_mtrr) {<br />+ _IPATH_VDBG("undoing WCCOMB on pio buffers\n");<br />+ mtrr_del(dd->ipath_mtrr, 0, 0);<br />+ dd->ipath_mtrr = 0;<br />+ }<br />+#endif<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * when closing, free up any allocated data for a port, if the<br />+ * reference count goes to zero<br />+ * Note: this also frees the portdata itself!<br />+ */<br />+void ipath_free_pddata(ipath_devdata * dd, uint32_t port, int freehdrq)<br />+{<br />+ ipath_portdata *pd = dd->ipath_pd[port];<br />+<br />+ if (!pd)<br />+ return;<br />+ if (freehdrq)<br />+ /*<br />+ * only clear and free portdata if we are going to<br />+ * also release the hdrq, otherwise we leak the hdrq on each<br />+ * open/close cycle<br />+ */<br />+ dd->ipath_pd[port] = NULL;<br />+ /* cleanup locked pages private data structures */<br />+ ipath_mlock_cleanup(pd);<br />+ if (freehdrq && pd->port_rcvhdrq) {<br />+ int i, n = 1 << pd->port_rcvhdrq_order;<br />+ _IPATH_VDBG("free closed port %d rcvhdrq @ %p (order=%u)\n",<br />+ pd->port_port, pd->port_rcvhdrq,<br />+ pd->port_rcvhdrq_order);<br />+ for (i = 0; i < n; i++)<br />+ ClearPageReserved(virt_to_page<br />+ (pd->port_rcvhdrq + (i * PAGE_SIZE)));<br />+ free_pages((unsigned long)pd->port_rcvhdrq,<br />+ pd->port_rcvhdrq_order);<br />+ pd->port_rcvhdrq = NULL;<br />+ }<br />+ if (port && pd->port_rcvegrbuf_pages) { /* always free this, however */<br />+ void *virt;<br />+ unsigned e, i, n = 1 << pd->port_rcvegrbuf_order;<br />+ if (pd->port_rcvegrbuf_virt) {<br />+ for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) {<br />+ virt = pd->port_rcvegrbuf_virt[e];<br />+ for (i = 0; i < n; i++)<br />+ ClearPageReserved(virt_to_page<br />+ (virt +<br />+ (i * PAGE_SIZE)));<br />+ _IPATH_VDBG<br />+ ("egrbuf free_pages(%p, %x), chunk %u/%u\n",<br />+ virt, pd->port_rcvegrbuf_order, e,<br />+ pd->port_rcvegrbuf_chunks);<br />+ free_pages((unsigned long)virt,<br />+ pd->port_rcvegrbuf_order);<br />+ }<br />+ vfree(pd->port_rcvegrbuf_virt);<br />+ pd->port_rcvegrbuf_virt = NULL;<br />+ }<br />+ pd->port_rcvegrbuf_chunks = 0;<br />+ _IPATH_VDBG("free closed port %d rcvegrbufs ptr array\n",<br />+ pd->port_port);<br />+ /* now the pointer array. */<br />+ vfree(pd->port_rcvegrbuf_pages);<br />+ pd->port_rcvegrbuf_pages = NULL;<br />+ } else if (port == 0 && dd->ipath_port0_skbs) {<br />+ unsigned e;<br />+ struct sk_buff **skbs = dd->ipath_port0_skbs;<br />+<br />+ dd->ipath_port0_skbs = NULL;<br />+ _IPATH_VDBG("free closed port %d ipath_port0_skbs @ %p\n",<br />+ pd->port_port, skbs);<br />+ for (e = 0; e < dd->ipath_rcvegrcnt; e++)<br />+ if (skbs[e])<br />+ dev_kfree_skb(skbs[e]);<br />+ vfree(skbs);<br />+ }<br />+ if (freehdrq) {<br />+ kfree(pd->port_tid_pg_list);<br />+ kfree(pd);<br />+ }<br />+}<br />+<br />+int __init infinipath_init(void)<br />+{<br />+ int r = 0, i;<br />+<br />+ _IPATH_DBG(KERN_INFO DRIVER_LOAD_MSG "%s", ipath_core_version);<br />+<br />+ ipath_init_picotime(); /* init cycles -> pico conversion */<br />+<br />+ if (!ipath_ctl_header) { /* should be always */<br />+ if (!(ipath_ctl_header = register_sysctl_table(ipath_ctl, 1)))<br />+ _IPATH_INFO("Couldn't register sysctl interface\n");<br />+ }<br />+<br />+ /*<br />+ * initialize the statusp to temporary storage so we can use it<br />+ * everywhere without first checking. When we "really" assign it,<br />+ * we copy from _ipath_status<br />+ */<br />+ for (i = 0; i < infinipath_max; i++)<br />+ devdata[i].ipath_statusp = &devdata[i]._ipath_status;<br />+<br />+ /*<br />+ * init these early, in case we take an interrupt as soon as the irq<br />+ * is setup. Saw a spinlock panic once that appeared to be due to that<br />+ * problem, when they were initted later on.<br />+ */<br />+ spin_lock_init(&ipath_pioavail_lock);<br />+ spin_lock_init(&ipath_sma_lock);<br />+<br />+ pci_register_driver(&infinipath_driver);<br />+<br />+ driver_create_file(&(infinipath_driver.driver), &driver_attr_version);<br />+<br />+ if ((r = register_chrdev(ipath_major, MODNAME, &ipath_fops)))<br />+ _IPATH_ERROR("Unable to register %s device\n", MODNAME);<br />+<br />+<br />+ /*<br />+ * never return an error, since we could have stuff registered,<br />+ * resources used, etc., even if no hardware found. This way we<br />+ * can clean up through unload.<br />+ */<br />+ return 0;<br />+}<br />+<br />+/*<br />+ * note: if for some reason the unload fails after this routine, and leaves<br />+ * the driver enterable by user code, we'll almost certainly crash and burn...<br />+ */<br />+static void __exit infinipath_cleanup(void)<br />+{<br />+ int r, m, port;<br />+<br />+ driver_remove_file(&(infinipath_driver.driver), &driver_attr_version);<br />+ if (ipath_ctl_header) {<br />+ unregister_sysctl_table(ipath_ctl_header);<br />+ ipath_ctl_header = NULL;<br />+ } else<br />+ _IPATH_DBG("No sysctl unregister, not registered OK\n");<br />+ if ((r = unregister_chrdev(ipath_major, MODNAME)))<br />+ _IPATH_DBG("unregister of device failed: %d\n", r);<br />+<br />+<br />+ /*<br />+ * turn off rcv, send, and interrupts for all ports, all drivers<br />+ * should also hard reset the chip here?<br />+ * free up port 0 (kernel) rcvhdr, egr bufs, and eventually tid bufs<br />+ * for all versions of the driver, if they were allocated<br />+ */<br />+ for (m = 0; m < infinipath_max; m++) {<br />+ uint64_t val;<br />+ ipath_devdata *dd = &devdata[m];<br />+ if (dd->ipath_kregbase) {<br />+ /* in case unload fails, be consistent */<br />+ dd->ipath_rcvctrl = 0U;<br />+ ipath_kput_kreg(m, kr_rcvctrl, dd->ipath_rcvctrl);<br />+<br />+ /*<br />+ * gracefully stop all sends allowing any in<br />+ * progress to trickle out first.<br />+ */<br />+ ipath_kput_kreg(m, kr_sendctrl, 0ULL);<br />+ val = ipath_kget_kreg64(m, kr_scratch); /* flush it */<br />+ /*<br />+ * enough for anything that's going to trickle<br />+ * out to have actually done so.<br />+ */<br />+ udelay(5);<br />+<br />+ /*<br />+ * abort any armed or launched PIO buffers that<br />+ * didn't go. (self clearing). Will cause any<br />+ * packet currently being transmitted to go out<br />+ * with an EBP, and may also cause a short packet<br />+ * error on the receiver.<br />+ */<br />+ ipath_kput_kreg(m, kr_sendctrl, INFINIPATH_S_ABORT);<br />+<br />+ /* mask interrupts, but not errors */<br />+ ipath_kput_kreg(m, kr_intmask, 0ULL);<br />+ ipath_shutdown_link(m);<br />+<br />+ /*<br />+ * clear all interrupts and errors. Next time<br />+ * driver is loaded, we know that whatever is<br />+ * set happened while we were unloaded<br />+ */<br />+ ipath_kput_kreg(m, kr_hwerrclear, ~0ULL);<br />+ ipath_kput_kreg(m, kr_errorclear, ~0ULL);<br />+ ipath_kput_kreg(m, kr_intclear, ~0ULL);<br />+ if (dd->__ipath_pioavailregs_base) {<br />+ kfree((void *)dd->__ipath_pioavailregs_base);<br />+ dd->__ipath_pioavailregs_base =<br />+ dd->ipath_pioavailregs_dma = 0;<br />+ }<br />+<br />+ if (dd->ipath_pageshadow) {<br />+ struct page **tmpp = dd->ipath_pageshadow;<br />+ int i, cnt = 0;<br />+<br />+ _IPATH_VDBG<br />+ ("Unlocking any expTID pages still locked\n");<br />+ for (port = 0; port < dd->ipath_cfgports;<br />+ port++) {<br />+ int port_tidbase =<br />+ port * dd->ipath_rcvtidcnt;<br />+ int maxtid =<br />+ port_tidbase + dd->ipath_rcvtidcnt;<br />+ for (i = port_tidbase; i < maxtid; i++) {<br />+ if (tmpp[i]) {<br />+ ipath_munlock(1,<br />+ &tmpp[i]);<br />+ tmpp[i] = 0;<br />+ cnt++;<br />+ }<br />+ }<br />+ }<br />+ if (cnt) {<br />+ ipath_stats.sps_pageunlocks += cnt;<br />+ _IPATH_VDBG<br />+ ("There were still %u expTID entries locked\n",<br />+ cnt);<br />+ }<br />+ if (ipath_stats.sps_pagelocks<br />+ || ipath_stats.sps_pageunlocks)<br />+ _IPATH_VDBG<br />+ ("%llu pages locked, %llu unlocked via ipath_m{un}lock\n",<br />+ ipath_stats.sps_pagelocks,<br />+ ipath_stats.sps_pageunlocks);<br />+<br />+ _IPATH_VDBG<br />+ ("Free shadow page tid array at %p\n",<br />+ dd->ipath_pageshadow);<br />+ vfree(dd->ipath_pageshadow);<br />+ dd->ipath_pageshadow = NULL;<br />+ }<br />+<br />+ /*<br />+ * free any resources still in use (usually just<br />+ * kernel ports) at unload<br />+ */<br />+ for (port = 0; port < dd->ipath_cfgports; port++)<br />+ ipath_free_pddata(dd, port, 1);<br />+ kfree(dd->ipath_pd);<br />+ /*<br />+ * debuggability, in case some cleanup path<br />+ * tries to use it after this<br />+ */<br />+ dd->ipath_pd = NULL;<br />+ }<br />+<br />+ if (dd->pcidev) {<br />+ if (dd->pcidev->irq) {<br />+ _IPATH_VDBG("unit %u free_irq of irq %x\n", m,<br />+ dd->pcidev->irq);<br />+ free_irq(dd->pcidev->irq, dd);<br />+ } else<br />+ _IPATH_DBG<br />+ ("irq is 0, not doing free_irq for unit %u\n",<br />+ m);<br />+ dd->pcidev = NULL;<br />+ }<br />+ if (dd->pci_registered) {<br />+ _IPATH_VDBG<br />+ ("Unregistering pci infrastructure unit %u\n", m);<br />+ pci_unregister_driver(&infinipath_driver);<br />+ dd->pci_registered = 0;<br />+ } else<br />+ _IPATH_VDBG<br />+ ("unit %u: no pci unreg, wasn't registered\n", m);<br />+ ipath_chip_cleanup(dd); /* clean up any per-chip chip-specific stuff */<br />+ }<br />+ /*<br />+ * clean up any chip-specific stuff for now, only one type of chip<br />+ * for any given driver<br />+ */<br />+ ipath_chip_done();<br />+<br />+ /* cleanup all our locked pages private data structures */<br />+ ipath_mlock_cleanup(NULL);<br />+}<br />+<br />+/* This is a generic function here, so it can return device-specific<br />+ * info. This allows keeping in sync with the version that supports<br />+ * multiple chip types.<br />+*/<br />+void ipath_get_boardname(const ipath_type t, char *name, size_t namelen)<br />+{<br />+ ipath_ht_get_boardname(t, name, namelen);<br />+}<br />+<br />+module_init(infinipath_init);<br />+module_exit(infinipath_cleanup);<br />+<br />+EXPORT_SYMBOL(infinipath_debug);<br />+EXPORT_SYMBOL(ipath_get_boardname);<br />-- <br />0.99.9n<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-12-17 00:57 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>