CINXE.COM
LKML: Roland Dreier: [PATCH 05/13] [RFC] ipath LLD core, part 2
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Roland Dreier: [PATCH 05/13] [RFC] ipath LLD core, part 2</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Roland Dreier" href="/groupie.php?aid=3215" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/12"> [Dec]</a> 聽 <a class="nb" href="/lkml/2005/12/16"> [16]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/12/16/301" onclick="this.href='/lkml/headers'+'/2005/12/16/301';">[headers]</a>聽 <a href="/lkml/bounce/2005/12/16/301">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/12/16/290">First message in thread</a></li><li><a href="/lkml/2005/12/16/291">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/303">Roland Dreier</a><ul><li class="origin"><a href="/lkml/2005/12/16/297">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/297">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/296">Roland Dreier</a></li><li><a href="/lkml/2005/12/17/74">Andrew Morton</a></li></ul></li></ul></li><li><a href="/lkml/2005/12/17/72">Andrew Morton</a><ul><li><a href="/lkml/2005/12/17/91">Robert Walsh</a></li></ul></li></ul></li><li><a href="/lkml/2005/12/17/25">Pekka Enberg</a><ul><li><a href="/lkml/2005/12/17/92">Robert Walsh</a></li></ul></li><li><a href="/lkml/2005/12/17/31">Christoph Hellwig</a></li><li><a href="/lkml/2005/12/17/78">Andrew Morton</a><ul><li><a href="/lkml/2005/12/17/103">Robert Walsh</a><ul><li><a href="/lkml/2005/12/17/128">Andrew Morton</a><ul><li><a href="/lkml/2005/12/17/132">Adrian Bunk</a></li><li><a href="/lkml/2005/12/18/2">Robert Walsh</a></li><li><a href="/lkml/2005/12/18/16">"David S. Miller"</a></li></ul></li><li><a href="/lkml/2005/12/17/130">Andi Kleen</a><ul><li><a href="/lkml/2005/12/18/3">Robert Walsh</a></li><li><a href="/lkml/2005/12/18/29">Alan Cox</a></li></ul></li></ul></li></ul></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/12/16/301/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Subject</td><td class="rp" itemprop="name">[PATCH 05/13] [RFC] ipath LLD core, part 2</td></tr><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Fri, 16 Dec 2005 15:48:55 -0800</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Roland Dreier <></td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">Next part of ipath core driver<br /><br />---<br /><br /> drivers/infiniband/hw/ipath/ipath_driver.c | 2290 ++++++++++++++++++++++++++++<br /> 1 files changed, 2290 insertions(+), 0 deletions(-)<br /><br />fc2b052ff2abadc8547dc1b319883f9c942b0ae4<br />diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c<br />index df650d6..0dee4ce 100644<br />--- a/drivers/infiniband/hw/ipath/ipath_driver.c<br />+++ b/drivers/infiniband/hw/ipath/ipath_driver.c<br />@@ -2587,3 +2587,2293 @@ static int ipath_get_unit_counters(struc<br /> return -EFAULT;<br /> return ipath_get_counters(c.unit, (struct infinipath_counters *)c.data);<br /> }<br />+<br />+/*<br />+ * ioctls for the control device, which is useful when you don't want<br />+ * to open the main device and use up a port.<br />+ */<br />+<br />+static int ipath_ctrl_ioctl(struct file *fp, unsigned int cmd, unsigned long a)<br />+{<br />+ int ret = 0;<br />+<br />+ switch (cmd) {<br />+ case IPATH_GETSTATS: /* return driver stats */<br />+ ret = ipath_get_stats((struct infinipath_stats *) a);<br />+ break;<br />+ case IPATH_GETUNITCOUNTERS: /* return chip counters */<br />+ ret = ipath_get_unit_counters((struct infinipath_getunitcounters *) a);<br />+ break;<br />+ default:<br />+ _IPATH_DBG("%x not a valid CTRL ioctl for infinipath\n", cmd);<br />+ ret = -EINVAL;<br />+ break;<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+long ipath_ioctl(struct file *fp, unsigned int cmd, unsigned long a)<br />+{<br />+ int ret = 0;<br />+ ipath_portdata *pd;<br />+ ipath_type unit;<br />+ uint32_t tmp, i, nactive = 0;<br />+<br />+ if (cmd == IPATH_GETUNITS) {<br />+ /*<br />+ * Return number of units supported. This is called<br />+ * here as this ioctl is needed via both the normal and<br />+ * diags interface, and it does not need the device to<br />+ * be opened.<br />+ */<br />+ return ipath_get_units();<br />+ }<br />+<br />+ pd = port_fp(fp);<br />+ if (!pd) {<br />+ if (IPATH_SMA == (unsigned long)fp->private_data)<br />+ /* sma separate; no pd */<br />+ return (long)ipath_sma_ioctl(fp, cmd, a);<br />+#ifdef IPATH_DIAG<br />+ else if (IPATH_DIAG == (unsigned long)fp->private_data)<br />+ /* diags separate; no pd */<br />+ return (long)ipath_diags_ioctl(fp, cmd, a);<br />+#endif<br />+ else if (IPATH_CTRL == (unsigned long)fp->private_data)<br />+ /* ctrl separate; no pd */<br />+ return (long)ipath_ctrl_ioctl(fp, cmd, a);<br />+ else {<br />+ _IPATH_DBG("NULL pd from fp (%p), cmd=%x\n", fp, cmd);<br />+ return -ENODEV; /* bad; shouldn't ever happen */<br />+ }<br />+ }<br />+<br />+ unit = pd->port_unit;<br />+<br />+ if ((devdata[unit].ipath_flags & IPATH_PRESENT)<br />+ && (cmd == IPATH_GETCOUNTERS || cmd == IPATH_GETSTATS<br />+ || cmd == IPATH_READ_EEPROM || cmd == IPATH_WRITE_EEPROM)) {<br />+ /* allowed to do these, as long as chip is accessible */<br />+ } else if (!(devdata[unit].ipath_flags & IPATH_INITTED)) {<br />+ _IPATH_DBG<br />+ ("%s not initialized (flags=0x%x), failing ioctl #%u\n",<br />+ ipath_get_unit_name(unit), devdata[unit].ipath_flags,<br />+ _IOC_NR(cmd));<br />+ ret = -ENODEV;<br />+ } else<br />+ if ((devdata[unit].<br />+ ipath_flags & (IPATH_LINKDOWN | IPATH_LINKUNK))) {<br />+ _IPATH_DBG("%s link is down, failing ioctl #%u\n",<br />+ ipath_get_unit_name(unit), _IOC_NR(cmd));<br />+ ret = -ENETDOWN;<br />+ }<br />+<br />+ if (ret)<br />+ return ret;<br />+<br />+ /* normal driver ioctls, not sim-specific */<br />+ switch (cmd) {<br />+ case IPATH_USERINIT:<br />+ /* real application is starting on a port */<br />+ ret = ipath_do_user_init(pd, (struct ipath_user_info *) a);<br />+ break;<br />+ case IPATH_BASEINFO:<br />+ /* it's done the init, now return the info it needs */<br />+ ret = ipath_get_baseinfo(pd, (struct ipath_base_info *) a);<br />+ break;<br />+ case IPATH_GETPORT:<br />+ /*<br />+ * just return the unit:port that we were assigned,<br />+ * and the number of active chips. This is is used for<br />+ * doing sched_setaffinity() before initialization.<br />+ */<br />+ for (i = 0; i < infinipath_max; i++)<br />+ if ((devdata[i].ipath_flags & IPATH_PRESENT)<br />+ && devdata[i].ipath_kregbase<br />+ && devdata[i].ipath_lid<br />+ && !(devdata[i].ipath_flags &<br />+ (IPATH_LINKDOWN | IPATH_LINKUNK)))<br />+ nactive++;<br />+ tmp = (nactive << 24) | (unit << 16) | unit;<br />+ if(copy_to_user((void *)a, &tmp, sizeof(unit)))<br />+ ret = EFAULT;<br />+ break;<br />+ case IPATH_GETLID:<br />+ /* get LID for given unit # */<br />+ ret = ipath_layer_get_lid(a);<br />+ break;<br />+ case IPATH_UPDM_TID: /* update expected TID entries */<br />+ ret = ipath_tid_update(pd, (struct _tidupd *)a);<br />+ break;<br />+ case IPATH_FREE_TID: /* free expected TID entries */<br />+ ret = ipath_tid_free(pd, (struct _tidupd *)a);<br />+ break;<br />+ case IPATH_GETCOUNTERS: /* return chip counters */<br />+ ret = ipath_get_counters(unit, (struct infinipath_counters *)a);<br />+ break;<br />+ case IPATH_GETSTATS: /* return driver stats */<br />+ ret = ipath_get_stats((struct infinipath_stats *) a);<br />+ break;<br />+ case IPATH_GETUNITCOUNTERS: /* return chip counters */<br />+ ret = ipath_get_unit_counters((struct infinipath_getunitcounters *) a);<br />+ break;<br />+ case IPATH_SET_PKEY: /* set a partition key */<br />+ ret = ipath_set_partkey(pd, (uint16_t) a);<br />+ break;<br />+ case IPATH_RCVCTRL: /* error handling to manage the rcvq */<br />+ ret = ipath_manage_rcvq(pd, (uint16_t) a);<br />+ break;<br />+ case IPATH_WRITE_EEPROM:<br />+ /* write the eeprom (for GUID) */<br />+ ret = ipath_wr_eeprom(pd, (struct ipath_eeprom_req *)a);<br />+ break;<br />+ case IPATH_READ_EEPROM: /* read the eeprom (for GUID) */<br />+ ret = ipath_rd_eeprom(pd->port_unit,<br />+ (struct ipath_eeprom_req *)a);<br />+ break;<br />+ case IPATH_WAIT:<br />+ /*<br />+ * wait for a receive intr for this port, or PIO avail<br />+ */<br />+ ret = ipath_wait_intr(pd, (uint32_t) a);<br />+ break;<br />+<br />+ default:<br />+ _IPATH_DBG("cmd %x (%c,%u) not a valid ioctl\n", cmd,<br />+ _IOC_TYPE(cmd), _IOC_NR(cmd));<br />+ ret = -EINVAL;<br />+ break;<br />+ }<br />+<br />+ return ret;<br />+}<br />+<br />+static loff_t ipath_llseek(struct file *fp, loff_t off, int whence)<br />+{<br />+ loff_t ret;<br />+<br />+ /* range checking is done where offset is used, not here. */<br />+ down(&fp->f_dentry->d_inode->i_sem);<br />+ if (!whence)<br />+ ret = fp->f_pos = off;<br />+ else if (whence == 1) {<br />+ fp->f_pos += off;<br />+ ret = fp->f_pos;<br />+ } else<br />+ ret = -EINVAL;<br />+ up(&fp->f_dentry->d_inode->i_sem);<br />+ _IPATH_DBG("New offset %llx from seek %llx whence=%d\n", fp->f_pos, off,<br />+ whence);<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * We use this to have a shared buffer between the kernel and the user<br />+ * code for the rcvhdr queue, egr buffers, and the per-port user regs and pio<br />+ * buffers in the chip. We have the open and close entries so we can bump<br />+ * the ref count and keep the driver from being unloaded while still mapped.<br />+ */<br />+<br />+static struct vm_operations_struct ipath_vmops = {<br />+ .nopage = ipath_nopage,<br />+};<br />+<br />+static int ipath_mmap(struct file *fp, struct vm_area_struct *vm)<br />+{<br />+ int setlen = 0, ret = -EINVAL;<br />+ ipath_portdata *pd;<br />+<br />+ if (fp->private_data && 255UL < (unsigned long)fp->private_data) {<br />+ pd = port_fp(fp);<br />+ {<br />+ /*<br />+ * This is the ipath_do_user_init() code,<br />+ * mapping the shared buffers into the user<br />+ * process. The address referred to by vm_pgoff<br />+ * is the virtual, not physical, address; we only<br />+ * do one mmap for each space mapped.<br />+ */<br />+ uint64_t pgaddr, ureg;<br />+<br />+ pgaddr = vm->vm_pgoff << PAGE_SHIFT;<br />+<br />+ /*<br />+ * note that ureg does *NOT* have the kregvirt<br />+ * as part of it, to be sure that for 32 bit<br />+ * programs, we don't end up trying to map<br />+ * a > 44 address. Has to match ipath_get_baseinfo()<br />+ * code that sets __spi_uregbase<br />+ */<br />+<br />+ ureg = devdata[pd->port_unit].ipath_uregbase +<br />+ devdata[pd->port_unit].ipath_palign * pd->port_port;<br />+<br />+ _IPATH_MMDBG<br />+ ("ushare: pgaddr %llx vm_start=%lx, vmlen %lx\n",<br />+ pgaddr, vm->vm_start, vm->vm_end - vm->vm_start);<br />+<br />+ if (pgaddr == ureg) {<br />+ /* it's the real hardware, so io_remap works */<br />+ unsigned long phys;<br />+ if ((vm->vm_end - vm->vm_start) > PAGE_SIZE) {<br />+ _IPATH_INFO<br />+ ("FAIL mmap userreg: reqlen %lx > PAGE\n",<br />+ vm->vm_end - vm->vm_start);<br />+ ret = -EFAULT;<br />+ } else {<br />+ phys =<br />+ devdata[pd->port_unit].<br />+ ipath_physaddr + ureg;<br />+ vm->vm_page_prot =<br />+ pgprot_noncached(vm->vm_page_prot);<br />+<br />+ vm->vm_flags |=<br />+ VM_DONTCOPY | VM_DONTEXPAND | VM_IO<br />+ | VM_SHM | VM_LOCKED;<br />+ ret =<br />+ io_remap_pfn_range(vm, vm->vm_start, phys >> PAGE_SHIFT,<br />+ vm->vm_end - vm->vm_start,<br />+ vm->vm_page_prot);<br />+ }<br />+ } else if (pgaddr == pd->port_piobufs) {<br />+ /*<br />+ * We use io_remap, so there is not a<br />+ * nopage handler for this case!<br />+ * when we map the PIO buffers, we want<br />+ * to map them as writeonly, no read possible.<br />+ */<br />+<br />+ unsigned long phys;<br />+ if ((vm->vm_end - vm->vm_start) ><br />+ (devdata[pd->port_unit].ipath_pbufsport *<br />+ devdata[pd->port_unit].ipath_palign)) {<br />+ _IPATH_INFO<br />+ ("FAIL mmap userreg: reqlen %lx > PAGE\n",<br />+ vm->vm_end - vm->vm_start);<br />+ ret = -EFAULT;<br />+ } else {<br />+ phys =<br />+ devdata[pd->port_unit].<br />+ ipath_physaddr + pd->port_piobufs;<br />+ /*<br />+ * Do *NOT* mark this as<br />+ * non-cached (PWT bit), or we<br />+ * don't get the write combining<br />+ * behavior we want on the<br />+ * PIO buffers!<br />+ * vm->vm_page_prot = pgprot_noncached(vm->vm_page_prot);<br />+ */<br />+<br />+#if defined (pgprot_writecombine) && defined(_PAGE_MA_WC)<br />+ /* Enable WC */<br />+ vm->vm_page_prot =<br />+ pgprot_writecombine(vm-><br />+ vm_page_prot);<br />+#endif<br />+<br />+ if (vm->vm_flags & VM_READ) {<br />+ _IPATH_INFO<br />+ ("Can't map piobufs as readable (flags=%lx)\n",<br />+ vm->vm_flags);<br />+ ret = -EPERM;<br />+ } else {<br />+ /*<br />+ * don't allow them to<br />+ * later change to readable<br />+ * with mprotect<br />+ */<br />+<br />+ vm->vm_flags &= ~VM_MAYWRITE;<br />+<br />+ vm->vm_flags |=<br />+ VM_DONTCOPY | VM_DONTEXPAND<br />+ | VM_IO | VM_SHM |<br />+ VM_LOCKED;<br />+ ret =<br />+ io_remap_pfn_range(vm, vm->vm_start, phys >> PAGE_SHIFT,<br />+ vm->vm_end - vm->vm_start,<br />+ vm->vm_page_prot);<br />+ }<br />+ }<br />+ } else if (pgaddr == (uint64_t) pd->port_rcvegr_phys) {<br />+ if (!pd->port_rcvegrbuf_virt)<br />+ return -EFAULT;<br />+ /*<br />+ * page_alloc'ed egr memory, not<br />+ * physically contiguous<br />+ * *BUT* to work around the 32 bit mmap64<br />+ * only handling 44 bits, we have remapped<br />+ * the first page to kernel virtual, so<br />+ * we have to do the conversion here to<br />+ * get back to the original virtual<br />+ * address (not contig pages) so we have<br />+ * to mark this for special handling.<br />+ */<br />+<br />+ /*<br />+ * not egrbufs * egrsize since they are<br />+ * no longer virtually contiguous.<br />+ */<br />+ setlen = pd->port_rcvegrbuf_chunks * PAGE_SIZE *<br />+ (1 << pd->port_rcvegrbuf_order);<br />+ if ((vm->vm_end - vm->vm_start) > setlen) {<br />+ _IPATH_INFO<br />+ ("FAIL on egr bufs: reqlen %lx > actual %x\n",<br />+ vm->vm_end - vm->vm_start, setlen);<br />+ ret = -EFAULT;<br />+ } else {<br />+ vm->vm_ops = &ipath_vmops;<br />+ vm->vm_private_data =<br />+ (void *)(3 | (uint64_t) pd);<br />+ if (vm->vm_flags & VM_WRITE) {<br />+ _IPATH_INFO<br />+ ("Can't map eager buffers as writable (flags=%lx)\n",<br />+ vm->vm_flags);<br />+ ret = -EPERM;<br />+ } else {<br />+ /*<br />+ * don't allow them to<br />+ * later change to writeable<br />+ * with mprotect<br />+ */<br />+<br />+ vm->vm_flags &= ~VM_MAYWRITE;<br />+ _IPATH_MMDBG<br />+ ("egrbufs, set private to %p, not %llx\n",<br />+ vm->vm_private_data,<br />+ pgaddr);<br />+ ret = 0;<br />+ }<br />+ }<br />+ } else if (pgaddr == (uint64_t) pd->port_rcvhdrq_phys) {<br />+ /*<br />+ * kmalloc'ed memory, physically<br />+ * contiguous; this is from<br />+ * spi_rcvhdr_base; we allow user to<br />+ * map read-write so they can write<br />+ * hdrq entries to allow protocol code<br />+ * to directly poll whether a hdrq entry<br />+ * has been written.<br />+ */<br />+ setlen =<br />+ round_up(devdata[pd->port_unit].<br />+ ipath_rcvhdrcnt *<br />+ devdata[pd->port_unit].<br />+ ipath_rcvhdrentsize *<br />+ sizeof(uint32_t), PAGE_SIZE);<br />+ if ((vm->vm_end - vm->vm_start) > setlen) {<br />+ _IPATH_INFO<br />+ ("FAIL on rcvhdrq: reqlen %lx > actual %x\n",<br />+ vm->vm_end - vm->vm_start, setlen);<br />+ ret = -EFAULT;<br />+ } else {<br />+ vm->vm_ops = &ipath_vmops;<br />+ vm->vm_private_data =<br />+ (void *)(pgaddr | 1);<br />+ ret = 0;<br />+ }<br />+ }<br />+ /*<br />+ * when we map the PIO bufferavail registers,<br />+ * we want to map them as readonly, no read<br />+ * possible.<br />+ */<br />+ else if (pgaddr ==<br />+ devdata[pd->port_unit].<br />+ ipath_pioavailregs_phys) {<br />+ /*<br />+ * kmalloc'ed memory, physically<br />+ * contiguous, one page only, readonly<br />+ */<br />+ setlen = PAGE_SIZE;<br />+ if ((vm->vm_end - vm->vm_start) > setlen) {<br />+ _IPATH_INFO<br />+ ("FAIL on pioavailregs_dma: reqlen %lx > actual %x\n",<br />+ vm->vm_end - vm->vm_start, setlen);<br />+ ret = -EFAULT;<br />+ } else if (vm->vm_flags & VM_WRITE) {<br />+ _IPATH_INFO<br />+ ("Can't map pioavailregs as writable (flags=%lx)\n",<br />+ vm->vm_flags);<br />+ ret = -EPERM;<br />+ } else {<br />+ /*<br />+ * don't allow them to later<br />+ * change with mprotect<br />+ */<br />+ vm->vm_flags &= ~VM_MAYWRITE;<br />+ vm->vm_ops = &ipath_vmops;<br />+ vm->vm_private_data =<br />+ (void *)(pgaddr | 2);<br />+ ret = 0;<br />+ }<br />+ }<br />+ if (!ret && setlen) {<br />+ /* keep page(s) from being swapped, etc. */<br />+ vm->vm_flags |=<br />+ VM_DONTEXPAND | VM_DONTCOPY | VM_RESERVED |<br />+ VM_IO | VM_SHM;<br />+ } else {<br />+ /* failure, or io_remap case */<br />+ vm->vm_private_data = NULL;<br />+ if (ret)<br />+ _IPATH_INFO<br />+ ("Failure %d, setlen %d, on addr %lx, off %lx\n",<br />+ ret, setlen, vm->vm_start,<br />+ vm->vm_pgoff);<br />+ }<br />+ }<br />+ } else /* something very wrong */<br />+ _IPATH_INFO("fp_private wasn't set, no mmaping\n");<br />+<br />+ return ret;<br />+}<br />+<br />+/* page fault handler. For each page that is first faulted in from the<br />+ * mmap'ed shared address buffer, this routine is called.<br />+ * It's always for a single page.<br />+ * We use the low bits of the private_data field to tell us which case<br />+ * we are dealing with.<br />+ */<br />+<br />+static struct page *ipath_nopage(struct vm_area_struct *vma, unsigned long addr,<br />+ int *type)<br />+{<br />+ unsigned long avirt, /* the original [kv]malloc virtual address */<br />+ paddr, /* physical address */<br />+ off; /* calculated page offset */<br />+ uint32_t which, chunk;<br />+ void *vaddr = NULL;<br />+ ipath_portdata *pd;<br />+ struct page *vpage = NOPAGE_SIGBUS;<br />+<br />+ if (!(avirt = (unsigned long)vma->vm_private_data)) {<br />+ _IPATH_DBG("NULL private_data, vm_pgoff %lx\n", vma->vm_pgoff);<br />+ which = 0; /* quiet incorrect gcc warning */<br />+ goto done;<br />+ }<br />+ which = avirt & 3;<br />+ avirt &= ~3ULL;<br />+<br />+ if (addr > vma->vm_end) {<br />+ _IPATH_DBG("trying to fault in addr %lx past end\n", addr);<br />+ goto done;<br />+ }<br />+<br />+ /*<br />+ * most of our memory is vmalloc'ed, but rcvhdr Q is physically<br />+ * contiguous, either from kmalloc or alloc_pages()<br />+ * pgoff is virtual.<br />+ */<br />+ switch (which) {<br />+ case 1: /* rcvhdrq_phys */<br />+ /* should always be 0 */<br />+ off = vma->vm_pgoff - (avirt >> PAGE_SHIFT);<br />+ paddr = addr - vma->vm_start + (off << PAGE_SHIFT) + avirt;<br />+ _IPATH_MMDBG("hdrq %lx (u=%lx)\n", paddr, addr);<br />+ vpage = pfn_to_page(paddr >> PAGE_SHIFT);<br />+ break;<br />+ case 2: /* PIO buffer avail regs */<br />+ /* should always be 0 */<br />+ off = vma->vm_pgoff - (avirt >> PAGE_SHIFT);<br />+ paddr = (addr - vma->vm_start + (off << PAGE_SHIFT) + avirt);<br />+ _IPATH_MMDBG("pioav %lx\n", paddr);<br />+ vpage = pfn_to_page(paddr >> PAGE_SHIFT);<br />+ break;<br />+ case 3:<br />+ /*<br />+ * rcvegrbufs; page_alloc()'ed like rcvhdrq, but we<br />+ * have to pick out which page_alloc()'ed chunk it is.<br />+ */<br />+ pd = (ipath_portdata *) avirt;<br />+ /* this should always be 0 */<br />+ off =<br />+ vma->vm_pgoff -<br />+ ((unsigned long)pd->port_rcvegr_phys >> PAGE_SHIFT);<br />+ off = (addr - vma->vm_start + (off << PAGE_SHIFT));<br />+<br />+ chunk = off / (PAGE_SIZE * (1 << pd->port_rcvegrbuf_order));<br />+ if (chunk > pd->port_rcvegrbuf_chunks)<br />+ _IPATH_DBG("Bad egrbuf chunk %u (max %u); off = %lx\n",<br />+ chunk, pd->port_rcvegrbuf_chunks, off);<br />+ vaddr = pd->port_rcvegrbuf_virt[chunk] +<br />+ off % (PAGE_SIZE * (1 << pd->port_rcvegrbuf_order));<br />+ paddr = virt_to_phys(vaddr);<br />+ vpage = pfn_to_page(paddr >> PAGE_SHIFT);<br />+ _IPATH_MMDBG("egrb %p,%lx\n", vaddr, paddr);<br />+ break;<br />+ default:<br />+ _IPATH_DBG<br />+ ("trying to fault in mmap addr %lx (avirt %lx) that isn't known (case %u)\n",<br />+ addr, avirt, which);<br />+ }<br />+<br />+done:<br />+ if (vpage != NOPAGE_SIGBUS && vpage != NOPAGE_OOM) {<br />+ if (which == 2)<br />+ /*<br />+ * media/video/video-buf.c doesn't do get_page() for<br />+ * buffer from alloc_page(). Hmmm.<br />+ *<br />+ * keep it from being swapped, complaints if<br />+ * process exits before we [vf]free it, etc,<br />+ * and keep shared page counts correct, etc.<br />+ */<br />+ get_page(vpage);<br />+ mark_page_accessed(vpage);<br />+ if (type)<br />+ *type = VM_FAULT_MINOR;<br />+ } else<br />+ _IPATH_DBG("faultin of addr %lx vaddr %p avirt %lx failed\n",<br />+ addr, vaddr, avirt);<br />+<br />+ return vpage;<br />+}<br />+<br />+/* this is separate to allow for better optimization of ipath_intr() */<br />+<br />+static void ipath_bad_intr(const ipath_type t, uint32_t * unexpectp)<br />+{<br />+ ipath_devdata *dd = &devdata[t];<br />+<br />+ /*<br />+ * sometimes happen during driver init and unload, don't want<br />+ * to process any interrupts at that point<br />+ */<br />+<br />+ /* this is just a bandaid, not a fix, if something goes badly wrong */<br />+ if (++*unexpectp > 100) {<br />+ if (++*unexpectp > 105) {<br />+ /*<br />+ * ok, we must be taking somebody else's interrupts,<br />+ * due to a messed up mptable and/or PIRQ table, so<br />+ * unregister the interrupt. We've seen this<br />+ * during linuxbios development work, and it<br />+ * may happen in the future again.<br />+ */<br />+ if (dd->pcidev && dd->pcidev->irq) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Now %u unexpected interrupts, unregistering interrupt handler\n",<br />+ *unexpectp);<br />+ _IPATH_DBG("free_irq of irq %x\n",<br />+ dd->pcidev->irq);<br />+ free_irq(dd->pcidev->irq, dd);<br />+ dd->pcidev->irq = 0;<br />+ }<br />+ }<br />+ if (ipath_kget_kreg32(t, kr_intmask)) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "%u unexpected interrupts, disabling interrupts completely\n",<br />+ *unexpectp);<br />+ /* disable all interrupts, something is very wrong */<br />+ ipath_kput_kreg(t, kr_intmask, 0ULL);<br />+ }<br />+ } else if (*unexpectp > 1)<br />+ _IPATH_DBG<br />+ ("Interrupt when not ready, should not happen, ignoring\n");<br />+}<br />+<br />+/* separate routine, for better optimization of ipath_intr() */<br />+<br />+static void ipath_bad_regread(const ipath_type t)<br />+{<br />+ static int allbits;<br />+ ipath_devdata *dd = &devdata[t];<br />+<br />+ /*<br />+ * We print the message and disable interrupts, in hope of<br />+ * having a better chance of debugging the problem.<br />+ */<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Read of interrupt status failed (all bits set)\n");<br />+ if (allbits++) {<br />+ /* disable all interrupts, something is very wrong */<br />+ ipath_kput_kreg(t, kr_intmask, 0ULL);<br />+ if (allbits == 2) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Still bad interrupt status, unregistering interrupt\n");<br />+ free_irq(dd->pcidev->irq, dd);<br />+ dd->pcidev->irq = 0;<br />+ } else if (allbits > 2) {<br />+ if ((allbits % 10000) == 0)<br />+ printk(".");<br />+ } else<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Disabling interrupts, multiple errors\n");<br />+ }<br />+}<br />+<br />+static irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs)<br />+{<br />+ ipath_devdata *dd = data;<br />+ const ipath_type t = IPATH_UNIT(dd);<br />+ uint32_t istat = ipath_kget_kreg32(t, kr_intstatus);<br />+ uint64_t estat = 0;<br />+ static unsigned unexpected = 0;<br />+<br />+ if (unlikely(!istat)) {<br />+ ipath_stats.sps_nullintr++;<br />+ /* not our interrupt, or already handled */<br />+ return IRQ_NONE;<br />+ }<br />+ if (unlikely(istat == ~0)) {<br />+ ipath_bad_regread(t);<br />+ /* don't know if it was our interrupt or not */<br />+ return IRQ_NONE;<br />+ }<br />+<br />+ ipath_stats.sps_ints++;<br />+<br />+ /*<br />+ * this needs to be flags&initted, not statusp, so we keep<br />+ * taking interrupts even after link goes down, etc.<br />+ * Also, we *must* clear the interrupt at some point, or we won't<br />+ * take it again, which can be real bad for errors, etc...<br />+ */<br />+<br />+ if (!(dd->ipath_flags & IPATH_INITTED)) {<br />+ ipath_bad_intr(t, &unexpected);<br />+ return IRQ_NONE;<br />+ }<br />+ if (unexpected)<br />+ unexpected = 0;<br />+<br />+ if (istat & ~infinipath_i_bitsextant)<br />+ _IPATH_UNIT_ERROR(t,<br />+ "interrupt with unknown interrupts %x set\n",<br />+ istat & (uint32_t) ~ infinipath_i_bitsextant);<br />+<br />+ if (istat & INFINIPATH_I_ERROR) {<br />+ ipath_stats.sps_errints++;<br />+ estat = ipath_kget_kreg64(t, kr_errorstatus);<br />+ if (!estat)<br />+ _IPATH_INFO<br />+ ("error interrupt (%x), but no error bits set!\n",<br />+ istat);<br />+ else if (estat == ~0ULL)<br />+ /*<br />+ * should we try clearing all, or hope next read<br />+ * works?<br />+ */<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Read of error status failed (all bits set); ignoring\n");<br />+ else<br />+ ipath_handle_errors(t, estat);<br />+ }<br />+<br />+ if (istat & INFINIPATH_I_GPIO) {<br />+ /* Clear GPIO status bit 2 */<br />+ ipath_kput_kreg(t, kr_gpio_clear, (uint64_t)(1 << 2));<br />+<br />+ /*<br />+ * Packets are available in the port 0 receive queue.<br />+ * Eventually this needs to be generalized to check<br />+ * IPATH_GPIO_INTR, and the specific GPIO bit, when<br />+ * GPIO interrupts start being used for other things.<br />+ * We skip that now to improve performance.<br />+ */<br />+ ipath_kreceive(t);<br />+ }<br />+<br />+ /*<br />+ * clear the ones we will deal with on this round<br />+ * We clear it early, mostly for receive interrupts, so we<br />+ * know the chip will have seen this by the time we process<br />+ * the queue, and will re-interrupt if necessary. The processor<br />+ * itself won't take the interrupt again until we return.<br />+ */<br />+ ipath_kput_kreg(t, kr_intclear, istat);<br />+<br />+ if (istat & INFINIPATH_I_SPIOBUFAVAIL) {<br />+ atomic_clear_mask(INFINIPATH_S_PIOINTBUFAVAIL,<br />+ &dd->ipath_sendctrl);<br />+ ipath_kput_kreg(t, kr_sendctrl, dd->ipath_sendctrl);<br />+<br />+ if (dd->ipath_portpiowait) {<br />+ uint32_t i;<br />+ /*<br />+ * start from port 1, since for now port 0 is<br />+ * never using wait_event for PIO<br />+ */<br />+ for (i = 1;<br />+ dd->ipath_portpiowait && i < dd->ipath_cfgports;<br />+ i++) {<br />+ if (dd->ipath_pd[i]<br />+ && dd->ipath_portpiowait & (1U << i)) {<br />+ atomic_clear_mask(1U << i,<br />+ &dd-><br />+ ipath_portpiowait);<br />+ if (dd->ipath_pd[i]-><br />+ port_flag & IPATH_PORT_WAITING_PIO)<br />+ {<br />+ dd->ipath_pd[i]->port_flag &=<br />+ ~IPATH_PORT_WAITING_PIO;<br />+ wake_up_interruptible(&dd-><br />+ ipath_pd<br />+ [i]-><br />+ port_wait);<br />+ }<br />+ }<br />+ }<br />+ }<br />+<br />+ if (dd->ipath_layer.l_intr) {<br />+ if (dd->ipath_layer.l_intr(t,<br />+ IPATH_LAYER_INT_SEND_CONTINUE)) {<br />+ atomic_set_mask(INFINIPATH_S_PIOINTBUFAVAIL,<br />+ &dd->ipath_sendctrl);<br />+ ipath_kput_kreg(t, kr_sendctrl,<br />+ dd->ipath_sendctrl);<br />+ } <br />+ }<br />+<br />+ if (dd->verbs_layer.l_piobufavail) {<br />+ if (!dd->verbs_layer.l_piobufavail(t)) {<br />+ atomic_set_mask(INFINIPATH_S_PIOINTBUFAVAIL,<br />+ &dd->ipath_sendctrl);<br />+ ipath_kput_kreg(t, kr_sendctrl,<br />+ dd->ipath_sendctrl);<br />+ }<br />+ }<br />+ }<br />+<br />+ /*<br />+ * we check for both transition from empty to non-empty, and urgent<br />+ * packets (those with the interrupt bit set in the header)<br />+ */<br />+<br />+ if (istat & ((infinipath_i_rcvavail_mask << INFINIPATH_I_RCVAVAIL_SHIFT)<br />+ | (infinipath_i_rcvurg_mask << INFINIPATH_I_RCVURG_SHIFT))) {<br />+ uint64_t portr;<br />+ int i;<br />+ uint32_t rcvdint = 0;<br />+<br />+ portr = ((istat >> INFINIPATH_I_RCVAVAIL_SHIFT) &<br />+ infinipath_i_rcvavail_mask)<br />+ | ((istat >> INFINIPATH_I_RCVURG_SHIFT) &<br />+ infinipath_i_rcvurg_mask);<br />+ for (i = 0; i < dd->ipath_cfgports; i++) {<br />+ if (portr & (1 << i) && dd->ipath_pd[i]) {<br />+ if (i == 0)<br />+ ipath_kreceive(t);<br />+ else if (dd->ipath_pd[i]-><br />+ port_flag & IPATH_PORT_WAITING_RCV) {<br />+ atomic_clear_mask<br />+ (IPATH_PORT_WAITING_RCV,<br />+ &dd->ipath_pd[i]->port_flag);<br />+ wake_up_interruptible(&dd->ipath_pd[i]-><br />+ port_wait);<br />+ rcvdint |= 1U << i;<br />+ }<br />+ }<br />+ }<br />+ if (rcvdint) {<br />+ /*<br />+ * only want to take one interrupt, so turn off<br />+ * the rcv interrupt for all the ports that we<br />+ * did the wakeup on (but never for kernel port)<br />+ */<br />+ atomic_clear_mask(rcvdint <<<br />+ INFINIPATH_R_INTRAVAIL_SHIFT,<br />+ &dd->ipath_rcvctrl);<br />+ ipath_kput_kreg(t, kr_rcvctrl, dd->ipath_rcvctrl);<br />+ }<br />+ }<br />+<br />+ return IRQ_HANDLED;<br />+}<br />+<br />+static void ipath_decode_err(char *buf, size_t blen, uint64_t err)<br />+{<br />+ *buf = '\0';<br />+ if (err & INFINIPATH_E_RHDRLEN)<br />+ strlcat(buf, "rhdrlen ", blen);<br />+ if (err & INFINIPATH_E_RBADTID)<br />+ strlcat(buf, "rbadtid ", blen);<br />+ if (err & INFINIPATH_E_RBADVERSION)<br />+ strlcat(buf, "rbadversion ", blen);<br />+ if (err & INFINIPATH_E_RHDR)<br />+ strlcat(buf, "rhdr ", blen);<br />+ if (err & INFINIPATH_E_RLONGPKTLEN)<br />+ strlcat(buf, "rlongpktlen ", blen);<br />+ if (err & INFINIPATH_E_RSHORTPKTLEN)<br />+ strlcat(buf, "rshortpktlen ", blen);<br />+ if (err & INFINIPATH_E_RMAXPKTLEN)<br />+ strlcat(buf, "rmaxpktlen ", blen);<br />+ if (err & INFINIPATH_E_RMINPKTLEN)<br />+ strlcat(buf, "rminpktlen ", blen);<br />+ if (err & INFINIPATH_E_RFORMATERR)<br />+ strlcat(buf, "rformaterr ", blen);<br />+ if (err & INFINIPATH_E_RUNSUPVL)<br />+ strlcat(buf, "runsupvl ", blen);<br />+ if (err & INFINIPATH_E_RUNEXPCHAR)<br />+ strlcat(buf, "runexpchar ", blen);<br />+ if (err & INFINIPATH_E_RIBFLOW)<br />+ strlcat(buf, "ribflow ", blen);<br />+ if (err & INFINIPATH_E_REBP)<br />+ strlcat(buf, "EBP ", blen);<br />+ if (err & INFINIPATH_E_SUNDERRUN)<br />+ strlcat(buf, "sunderrun ", blen);<br />+ if (err & INFINIPATH_E_SPIOARMLAUNCH)<br />+ strlcat(buf, "spioarmlaunch ", blen);<br />+ if (err & INFINIPATH_E_SUNEXPERRPKTNUM)<br />+ strlcat(buf, "sunexperrpktnum ", blen);<br />+ if (err & INFINIPATH_E_SDROPPEDDATAPKT)<br />+ strlcat(buf, "sdroppeddatapkt ", blen);<br />+ if (err & INFINIPATH_E_SDROPPEDSMPPKT)<br />+ strlcat(buf, "sdroppedsmppkt ", blen);<br />+ if (err & INFINIPATH_E_SMAXPKTLEN)<br />+ strlcat(buf, "smaxpktlen ", blen);<br />+ if (err & INFINIPATH_E_SMINPKTLEN)<br />+ strlcat(buf, "sminpktlen ", blen);<br />+ if (err & INFINIPATH_E_SUNSUPVL)<br />+ strlcat(buf, "sunsupVL ", blen);<br />+ if (err & INFINIPATH_E_SPKTLEN)<br />+ strlcat(buf, "spktlen ", blen);<br />+ if (err & INFINIPATH_E_INVALIDADDR)<br />+ strlcat(buf, "invalidaddr ", blen);<br />+ if (err & INFINIPATH_E_RICRC)<br />+ strlcat(buf, "CRC ", blen);<br />+ if (err & INFINIPATH_E_RVCRC)<br />+ strlcat(buf, "VCRC ", blen);<br />+ if (err & INFINIPATH_E_RRCVEGRFULL)<br />+ strlcat(buf, "rcvegrfull ", blen);<br />+ if (err & INFINIPATH_E_RRCVHDRFULL)<br />+ strlcat(buf, "rcvhdrfull ", blen);<br />+ if (err & INFINIPATH_E_IBSTATUSCHANGED)<br />+ strlcat(buf, "ibcstatuschg ", blen);<br />+ if (err & INFINIPATH_E_RIBLOSTLINK)<br />+ strlcat(buf, "riblostlink ", blen);<br />+ if (err & INFINIPATH_E_HARDWARE)<br />+ strlcat(buf, "hardware ", blen);<br />+ if (err & INFINIPATH_E_RESET)<br />+ strlcat(buf, "reset ", blen);<br />+}<br />+<br />+/* decode RHF errors; only used one place now, may want more later */<br />+static void get_rhf_errstring(uint32_t err, char *msg, size_t len)<br />+{<br />+ /* if no errors, and so don't need to check what's first */<br />+ *msg = '\0';<br />+<br />+ if (err & INFINIPATH_RHF_H_ICRCERR)<br />+ strlcat(msg, "icrcerr ", len);<br />+ if (err & INFINIPATH_RHF_H_VCRCERR)<br />+ strlcat(msg, "vcrcerr ", len);<br />+ if (err & INFINIPATH_RHF_H_PARITYERR)<br />+ strlcat(msg, "parityerr ", len);<br />+ if (err & INFINIPATH_RHF_H_LENERR)<br />+ strlcat(msg, "lenerr ", len);<br />+ if (err & INFINIPATH_RHF_H_MTUERR)<br />+ strlcat(msg, "mtuerr ", len);<br />+ if (err & INFINIPATH_RHF_H_IHDRERR)<br />+ /* infinipath hdr checksum error */<br />+ strlcat(msg, "ipathhdrerr ", len);<br />+ if (err & INFINIPATH_RHF_H_TIDERR)<br />+ strlcat(msg, "tiderr ", len);<br />+ if (err & INFINIPATH_RHF_H_MKERR)<br />+ /* bad port, offset, etc. */<br />+ strlcat(msg, "invalid ipathhdr ", len);<br />+ if (err & INFINIPATH_RHF_H_IBERR)<br />+ strlcat(msg, "iberr ", len);<br />+ if (err & INFINIPATH_RHF_L_SWA)<br />+ strlcat(msg, "swA ", len);<br />+ if (err & INFINIPATH_RHF_L_SWB)<br />+ strlcat(msg, "swB ", len);<br />+}<br />+<br />+static void ipath_handle_errors(const ipath_type t, uint64_t errs)<br />+{<br />+ char msg[512];<br />+ uint32_t piobcnt;<br />+ uint64_t sbuf[4], ignore_this_time = 0;<br />+ int i;<br />+ int chkerrpkts = 0, noprint = 0;<br />+ cycles_t nc;<br />+ static cycles_t nextmsg_time;<br />+ static unsigned nmsgs, supp_msgs;<br />+ ipath_devdata *dd = &devdata[t];<br />+<br />+#define E_SUM_PKTERRS (INFINIPATH_E_RHDRLEN | INFINIPATH_E_RBADTID \<br />+ | INFINIPATH_E_RBADVERSION \<br />+ | INFINIPATH_E_RHDR | INFINIPATH_E_RLONGPKTLEN | INFINIPATH_E_RSHORTPKTLEN \<br />+ | INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RMINPKTLEN \<br />+ | INFINIPATH_E_RFORMATERR | INFINIPATH_E_RUNSUPVL | INFINIPATH_E_RUNEXPCHAR \<br />+ | INFINIPATH_E_REBP)<br />+<br />+#define E_SUM_ERRS ( INFINIPATH_E_SPIOARMLAUNCH \<br />+ | INFINIPATH_E_SUNEXPERRPKTNUM | INFINIPATH_E_SDROPPEDDATAPKT \<br />+ | INFINIPATH_E_SDROPPEDSMPPKT | INFINIPATH_E_SMAXPKTLEN \<br />+ | INFINIPATH_E_SUNSUPVL | INFINIPATH_E_SMINPKTLEN | INFINIPATH_E_SPKTLEN \<br />+ | INFINIPATH_E_INVALIDADDR)<br />+<br />+ /*<br />+ * throttle back "fast" messages to no more than 10 per 5 seconds<br />+ * (1.4-2GHz clock). This isn't perfect, but it's a reasonable<br />+ * heuristic<br />+ * If we get more than 10, give a 5x longer delay<br />+ */<br />+ nc = get_cycles();<br />+ if (nmsgs > 10) {<br />+ if (nc < nextmsg_time) {<br />+ noprint = 1;<br />+ if (!supp_msgs++)<br />+ nextmsg_time = nc + 50000000000ULL;<br />+ } else if (supp_msgs) {<br />+ /*<br />+ * Print the message unless it's ibc status<br />+ * change only, which happens so often we never<br />+ * want to count it.<br />+ */<br />+ if (dd->ipath_lasterror & ~INFINIPATH_E_IBSTATUSCHANGED) {<br />+ ipath_decode_err(msg, sizeof msg,<br />+ dd-><br />+ ipath_lasterror &<br />+ ~INFINIPATH_E_IBSTATUSCHANGED);<br />+ if (dd-><br />+ ipath_lasterror & ~(INFINIPATH_E_RRCVEGRFULL<br />+ |<br />+ INFINIPATH_E_RRCVHDRFULL))<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Suppressed %u messages for fast-repeating errors (%s) (%llx)\n",<br />+ supp_msgs, msg,<br />+ dd->ipath_lasterror);<br />+ else {<br />+ /*<br />+ * rcvegrfull and rcvhdrqfull are<br />+ * "normal", for some types of<br />+ * processes (mostly benchmarks)<br />+ * that send huge numbers of<br />+ * messages, while not processing<br />+ * them. So only complain about<br />+ * these at debug level.<br />+ */<br />+ _IPATH_DBG<br />+ ("Suppressed %u messages for %s\n",<br />+ supp_msgs, msg);<br />+ }<br />+ }<br />+ supp_msgs = 0;<br />+ nmsgs = 0;<br />+ }<br />+ } else if (!nmsgs++ || nc > nextmsg_time) /* start timer */<br />+ nextmsg_time = nc + 10000000000ULL;<br />+<br />+ /*<br />+ * don't report errors that are masked (includes those always<br />+ * ignored)<br />+ */<br />+ errs &= ~dd->ipath_maskederrs;<br />+<br />+ /* do these first, they are most important */<br />+ if (errs & INFINIPATH_E_HARDWARE) {<br />+ /* reuse same msg buf */<br />+ ipath_handle_hwerrors(t, msg, sizeof msg);<br />+ }<br />+<br />+ if (!noprint && (errs & ~infinipath_e_bitsextant))<br />+ _IPATH_UNIT_ERROR(t,<br />+ "error interrupt with unknown errors %llx set\n",<br />+ errs & ~infinipath_e_bitsextant);<br />+<br />+ if (errs & E_SUM_ERRS) {<br />+ /* if possible that sendbuffererror could be valid */<br />+ piobcnt = dd->ipath_piobcnt;<br />+ /* read these before writing errorclear */<br />+ sbuf[0] = ipath_kget_kreg64(t, kr_sendbuffererror);<br />+ sbuf[1] = ipath_kget_kreg64(t, kr_sendbuffererror + 1);<br />+ if (piobcnt > 128) {<br />+ sbuf[2] = ipath_kget_kreg64(t, kr_sendbuffererror + 2);<br />+ sbuf[3] = ipath_kget_kreg64(t, kr_sendbuffererror + 3);<br />+ }<br />+<br />+ if (sbuf[0] || sbuf[1]<br />+ || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) {<br />+ _IPATH_PDBG("SendbufErrs %llx %llx ", sbuf[0], sbuf[1]);<br />+ if (infinipath_debug & __IPATH_PKTDBG && piobcnt > 128)<br />+ printk("%llx %llx ", sbuf[2], sbuf[3]);<br />+ for (i = 0; i < piobcnt; i++) {<br />+ if (test_bit(i, sbuf)) {<br />+ uint32_t sendctrl;<br />+ if (infinipath_debug & __IPATH_PKTDBG)<br />+ printk("%u ", i);<br />+ sendctrl =<br />+ dd-><br />+ ipath_sendctrl | INFINIPATH_S_DISARM<br />+ | (i <<<br />+ INFINIPATH_S_DISARMPIOBUF_SHIFT);<br />+ ipath_kput_kreg(t, kr_sendctrl,<br />+ sendctrl);<br />+ }<br />+ }<br />+ if (infinipath_debug & __IPATH_PKTDBG)<br />+ printk("\n");<br />+ }<br />+ if ((errs &<br />+ (INFINIPATH_E_SDROPPEDDATAPKT | INFINIPATH_E_SDROPPEDSMPPKT<br />+ | INFINIPATH_E_SMINPKTLEN))<br />+ && !(dd->ipath_flags & IPATH_LINKACTIVE)) {<br />+ /*<br />+ * This can happen when SMA is trying to bring<br />+ * the link up, but the IB link changes state<br />+ * at the "wrong" time. The IB logic then<br />+ * complains that the packet isn't valid.<br />+ * We don't want to confuse people, so we just<br />+ * don't print them, except at debug<br />+ */<br />+ _IPATH_DBG<br />+ ("Ignoring pktsend errors %llx, because not yet active\n",<br />+ errs);<br />+ ignore_this_time |=<br />+ INFINIPATH_E_SDROPPEDDATAPKT |<br />+ INFINIPATH_E_SDROPPEDSMPPKT |<br />+ INFINIPATH_E_SMINPKTLEN;<br />+ }<br />+ }<br />+<br />+ if (supp_msgs == 250000) {<br />+ /*<br />+ * It's not entirely reasonable assuming that the errors<br />+ * set in the last clear period are all responsible for<br />+ * the problem, but the alternative is to assume it's the only<br />+ * ones on this particular interrupt, which also isn't great<br />+ */<br />+ dd->ipath_maskederrs |= dd->ipath_lasterror | errs;<br />+ ipath_kput_kreg(t, kr_errormask, ~dd->ipath_maskederrs);<br />+ ipath_decode_err(msg, sizeof msg,<br />+ (dd->ipath_maskederrs & ~dd-><br />+ ipath_ignorederrs));<br />+<br />+ if ((dd->ipath_maskederrs & ~dd->ipath_ignorederrs)<br />+ & ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL))<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Disabling error(s) %llx because occuring too frequently (%s)\n",<br />+ (dd->ipath_maskederrs & ~dd-><br />+ ipath_ignorederrs), msg);<br />+ else {<br />+ /*<br />+ * rcvegrfull and rcvhdrqfull are "normal",<br />+ * for some types of processes (mostly benchmarks)<br />+ * that send huge numbers of messages, while not<br />+ * processing them. So only complain about<br />+ * these at debug level.<br />+ */<br />+ _IPATH_DBG<br />+ ("Disabling frequent queue full errors (%s)\n",<br />+ msg);<br />+ }<br />+<br />+ /*<br />+ * re-enable the masked errors after around 3 minutes.<br />+ * in ipath_get_faststats(). If we have a series of<br />+ * fast repeating but different errors, the interval will keep<br />+ * stretching out, but that's OK, as that's pretty catastrophic.<br />+ */<br />+ dd->ipath_unmasktime = nc + 400000000000ULL;<br />+ }<br />+<br />+ ipath_kput_kreg(t, kr_errorclear, errs);<br />+ if (ignore_this_time)<br />+ errs &= ~ignore_this_time;<br />+ if (errs & ~dd->ipath_lasterror) {<br />+ errs &= ~dd->ipath_lasterror;<br />+ /* never suppress duplicate hwerrors or ibstatuschange */<br />+ dd->ipath_lasterror |= errs &<br />+ ~(INFINIPATH_E_HARDWARE | INFINIPATH_E_IBSTATUSCHANGED);<br />+ }<br />+ if (!errs)<br />+ return;<br />+<br />+ if (!noprint)<br />+ /* the ones we mask off are handled specially below or above */<br />+ ipath_decode_err(msg, sizeof msg,<br />+ errs & ~(INFINIPATH_E_IBSTATUSCHANGED |<br />+ INFINIPATH_E_RRCVEGRFULL |<br />+ INFINIPATH_E_RRCVHDRFULL |<br />+ INFINIPATH_E_HARDWARE));<br />+ else<br />+ /* so we don't need if(!noprint) at strlcat's below */<br />+ *msg = 0;<br />+<br />+ if (errs & E_SUM_PKTERRS) {<br />+ ipath_stats.sps_pkterrs++;<br />+ chkerrpkts = 1;<br />+ }<br />+ if (errs & E_SUM_ERRS)<br />+ ipath_stats.sps_errs++;<br />+<br />+ if (errs & (INFINIPATH_E_RICRC | INFINIPATH_E_RVCRC)) {<br />+ ipath_stats.sps_crcerrs++;<br />+ chkerrpkts = 1;<br />+ }<br />+<br />+ /*<br />+ * We don't want to print these two as they happen, or we can make<br />+ * the situation even worse, because it takes so long to print messages.<br />+ * to serial consoles. kernel ports get printed from fast_stats, no<br />+ * more than every 5 seconds, user ports get printed on close<br />+ */<br />+ if (errs & INFINIPATH_E_RRCVHDRFULL) {<br />+ int any;<br />+ uint32_t hd, tl;<br />+ ipath_stats.sps_hdrqfull++;<br />+ for (any = i = 0; i < dd->ipath_cfgports; i++) {<br />+ if (i == 0) {<br />+ hd = dd->ipath_port0head;<br />+ tl = *dd->ipath_hdrqtailptr;<br />+ } else if (dd->ipath_pd[i] &&<br />+ dd->ipath_pd[i]->port_rcvhdrtail_kvaddr) {<br />+ /*<br />+ * don't report same point multiple times,<br />+ * except kernel<br />+ */<br />+ tl = (uint32_t) *<br />+ dd->ipath_pd[i]->port_rcvhdrtail_kvaddr;<br />+ if (tl == dd->ipath_lastrcvhdrqtails[i])<br />+ continue;<br />+ hd = ipath_kget_ureg32(t, ur_rcvhdrhead, i);<br />+ } else<br />+ continue;<br />+ if (hd == (tl + 1) || (!hd && tl == dd->ipath_hdrqlast)) {<br />+ dd->ipath_lastrcvhdrqtails[i] = tl;<br />+ dd->ipath_pd[i]->port_hdrqfull++;<br />+ if (i == 0)<br />+ chkerrpkts = 1;<br />+ }<br />+ }<br />+ }<br />+ if (errs & INFINIPATH_E_RRCVEGRFULL) {<br />+ /*<br />+ * since this is of less importance and not likely to<br />+ * happen without also getting hdrfull, only count<br />+ * occurrences; don't check each port (or even the kernel<br />+ * vs user)<br />+ */<br />+ ipath_stats.sps_etidfull++;<br />+ if (dd->ipath_port0head != *dd->ipath_hdrqtailptr)<br />+ chkerrpkts = 1;<br />+ }<br />+<br />+ /*<br />+ * do this before IBSTATUSCHANGED, in case both bits set in a single<br />+ * interrupt; we want the STATUSCHANGE to "win", so we do our <br />+ * internal copy of state machine correctly<br />+ */<br />+ if (errs & INFINIPATH_E_RIBLOSTLINK) {<br />+ /* force through block below */<br />+ errs |= INFINIPATH_E_IBSTATUSCHANGED;<br />+ ipath_stats.sps_iblink++;<br />+ dd->ipath_flags |= IPATH_LINKDOWN;<br />+ dd->ipath_flags &= ~(IPATH_LINKUNK | IPATH_LINKINIT<br />+ | IPATH_LINKARMED | IPATH_LINKACTIVE);<br />+ if (!noprint)<br />+ _IPATH_DBG("Lost link, link now down (%s)\n",<br />+ ipath_ibcstatus_str[ipath_kget_kreg64<br />+ (t,<br />+ kr_ibcstatus) & 0xf]);<br />+ }<br />+<br />+ if ((errs & INFINIPATH_E_IBSTATUSCHANGED) && (!ipath_diags_enabled)) {<br />+ uint64_t val;<br />+ uint32_t ltstate;<br />+<br />+ val = ipath_kget_kreg64(t, kr_ibcstatus);<br />+ ltstate = val & 0xff;<br />+ if(ltstate == 0x11 || ltstate == 0x21 || ltstate == 0x31)<br />+ _IPATH_DBG("Link state changed unit %u to 0x%x, last was 0x%llx\n",<br />+ t, ltstate, dd->ipath_lastibcstat);<br />+ else {<br />+ ltstate = dd->ipath_lastibcstat & 0xff;<br />+ if(ltstate == 0x11 || ltstate == 0x21 || ltstate == 0x31)<br />+ _IPATH_DBG("Link state unit %u changed to down state 0x%llx, last was 0x%llx\n",<br />+ t, val, dd->ipath_lastibcstat);<br />+ else<br />+ _IPATH_VDBG("Link state unit %u changed to 0x%llx from one of down states\n",<br />+ t, val);<br />+ }<br />+ ltstate = (val >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) &<br />+ INFINIPATH_IBCS_LINKTRAININGSTATE_MASK;<br />+<br />+ if (ltstate == 2 || ltstate == 3) {<br />+ uint32_t last_ltstate;<br />+<br />+ /*<br />+ * ignore cycling back and forth from states 2 to 3<br />+ * while waiting for other end of link to come up<br />+ * except that if it keeps happening, we switch between<br />+ * linkinitstate SLEEP and POLL. While we cycle<br />+ * back and forth between them, we aren't seeing<br />+ * any other device, either no cable plugged in,<br />+ * other device powered off, other device is<br />+ * switch that hasn't yet polled us, etc.<br />+ */<br />+ last_ltstate = (dd->ipath_lastibcstat >><br />+ INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT)<br />+ & INFINIPATH_IBCS_LINKTRAININGSTATE_MASK;<br />+ if (last_ltstate == 2 || last_ltstate == 3) {<br />+ if (++dd->ipath_ibpollcnt > 4) {<br />+ uint64_t ibc;<br />+ dd->ipath_flags |=<br />+ IPATH_LINK_SLEEPING | IPATH_NOCABLE;<br />+ *dd->ipath_statusp |=<br />+ IPATH_STATUS_IB_NOCABLE;<br />+ _IPATH_VDBG<br />+ ("linkinitcmd POLL, move to SLEEP\n");<br />+ ibc = dd->ipath_ibcctrl;<br />+ ibc |= INFINIPATH_IBCC_LINKINITCMD_SLEEP<br />+ <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT;<br />+ /*<br />+ * don't put linkinitcmd in<br />+ * ipath_ibcctrl, want that to<br />+ * stay a NOP<br />+ */<br />+ ipath_kput_kreg(t, kr_ibcctrl, ibc);<br />+ dd->ipath_ibpollcnt = 0;<br />+ }<br />+ goto skip_ibchange;<br />+ }<br />+ }<br />+ /* some state other than 2 or 3 */<br />+ dd->ipath_ibpollcnt = 0;<br />+ ipath_stats.sps_iblink++;<br />+ /*<br />+ * Note: We try to match the Mellanox HCA LED behavior<br />+ * as best we can. That changed around Oct 2003.<br />+ * Green indicates link state (something is plugged in,<br />+ * and we can train). Amber indicates the link is<br />+ * logically up (ACTIVE). Mellanox further blinks the<br />+ * amber LED to indicate data packet activity, but we<br />+ * have no hardware support for that, so it would require<br />+ * waking up every 10-20 msecs and checking the counters<br />+ * on the chip, and then turning the LED off if<br />+ * appropriate. That's visible overhead, so not something<br />+ * we will do.<br />+ */<br />+ if (ltstate != 1 || ((dd->ipath_lastibcstat & 0x30) == 0x30 &&<br />+ (val & 0x30) != 0x30)) {<br />+ dd->ipath_flags |= IPATH_LINKDOWN;<br />+ dd->ipath_flags &= ~(IPATH_LINKUNK | IPATH_LINKINIT<br />+ | IPATH_LINKACTIVE |<br />+ IPATH_LINKARMED);<br />+ *dd->ipath_statusp &= ~IPATH_STATUS_IB_READY;<br />+ if (!noprint) {<br />+ if ((dd->ipath_lastibcstat & 0x30) == 0x30)<br />+ /* if from up to down be more vocal */<br />+ _IPATH_DBG("Link unit %u is now down (%s)\n",<br />+ t, ipath_ibcstatus_str<br />+ [ltstate]);<br />+ else<br />+ _IPATH_VDBG("Link unit %u is down (%s)\n",<br />+ t, ipath_ibcstatus_str<br />+ [ltstate]);<br />+ }<br />+<br />+ if (val & 0x30) {<br />+ /* leave just green on, 0x11 and 0x21 */<br />+ dd->ipath_extctrl &=<br />+ ~INFINIPATH_EXTC_LEDPRIPORTYELLOWON;<br />+ dd->ipath_extctrl |=<br />+ INFINIPATH_EXTC_LEDPRIPORTGREENON;<br />+ } else /* not up at all, so turn the leds off */<br />+ dd->ipath_extctrl &=<br />+ ~(INFINIPATH_EXTC_LEDPRIPORTGREENON |<br />+ INFINIPATH_EXTC_LEDPRIPORTYELLOWON);<br />+ ipath_kput_kreg(t, kr_extctrl,<br />+ (uint64_t) dd->ipath_extctrl);<br />+ if (ltstate == 1<br />+ && (dd-><br />+ ipath_flags & (IPATH_LINK_TOARMED |<br />+ IPATH_LINK_TOACTIVE))) {<br />+ ipath_set_ib_lstate(t,<br />+ INFINIPATH_IBCC_LINKCMD_INIT);<br />+ }<br />+ } else if ((val & 0x31) == 0x31) {<br />+ if (!noprint)<br />+ _IPATH_DBG("Link unit %u is now in active state\n", t);<br />+ dd->ipath_flags |= IPATH_LINKACTIVE;<br />+ dd->ipath_flags &=<br />+ ~(IPATH_LINKUNK | IPATH_LINKINIT | IPATH_LINKDOWN |<br />+ IPATH_LINKARMED | IPATH_NOCABLE |<br />+ IPATH_LINK_TOACTIVE | IPATH_LINK_SLEEPING);<br />+ *dd->ipath_statusp &= ~IPATH_STATUS_IB_NOCABLE;<br />+ *dd->ipath_statusp |=<br />+ IPATH_STATUS_IB_READY | IPATH_STATUS_IB_CONF;<br />+ /* set the externally visible LEDs to indicate state */<br />+ dd->ipath_extctrl |= INFINIPATH_EXTC_LEDPRIPORTGREENON<br />+ | INFINIPATH_EXTC_LEDPRIPORTYELLOWON;<br />+ ipath_kput_kreg(t, kr_extctrl,<br />+ (uint64_t) dd->ipath_extctrl);<br />+<br />+ /*<br />+ * since we are now active, set the linkinitcmd<br />+ * to NOP (0) it was probably either POLL or SLEEP<br />+ */<br />+ dd->ipath_ibcctrl &=<br />+ ~(INFINIPATH_IBCC_LINKINITCMD_MASK <<<br />+ INFINIPATH_IBCC_LINKINITCMD_SHIFT);<br />+ ipath_kput_kreg(t, kr_ibcctrl, dd->ipath_ibcctrl);<br />+<br />+ if (devdata[t].ipath_layer.l_intr)<br />+ devdata[t].ipath_layer.l_intr(t,<br />+ IPATH_LAYER_INT_IF_UP);<br />+ } else if ((val & 0x31) == 0x11) {<br />+ /*<br />+ * set set INIT and DOWN. Down is checked by<br />+ * most of the other code, but INIT is useful<br />+ * to know in a few places.<br />+ */<br />+ dd->ipath_flags |= IPATH_LINKINIT | IPATH_LINKDOWN;<br />+ dd->ipath_flags &=<br />+ ~(IPATH_LINKUNK | IPATH_LINKACTIVE | IPATH_LINKARMED<br />+ | IPATH_NOCABLE | IPATH_LINK_SLEEPING);<br />+ *dd->ipath_statusp &= ~(IPATH_STATUS_IB_NOCABLE<br />+ | IPATH_STATUS_IB_READY);<br />+<br />+ /* set the externally visible LEDs to indicate state */<br />+ dd->ipath_extctrl &=<br />+ ~INFINIPATH_EXTC_LEDPRIPORTYELLOWON;<br />+ dd->ipath_extctrl |= INFINIPATH_EXTC_LEDPRIPORTGREENON;<br />+ ipath_kput_kreg(t, kr_extctrl,<br />+ (uint64_t) dd->ipath_extctrl);<br />+ if (dd-><br />+ ipath_flags & (IPATH_LINK_TOARMED |<br />+ IPATH_LINK_TOACTIVE)) {<br />+ /*<br />+ * if we got here while trying to bring<br />+ * the link up, try again, but only once more!<br />+ */<br />+ ipath_set_ib_lstate(t,<br />+ INFINIPATH_IBCC_LINKCMD_ARMED);<br />+ dd->ipath_flags &=<br />+ ~(IPATH_LINK_TOARMED | IPATH_LINK_TOACTIVE);<br />+ }<br />+ } else if ((val & 0x31) == 0x21) {<br />+ dd->ipath_flags |= IPATH_LINKARMED;<br />+ dd->ipath_flags &=<br />+ ~(IPATH_LINKUNK | IPATH_LINKDOWN | IPATH_LINKINIT |<br />+ IPATH_LINKACTIVE | IPATH_NOCABLE |<br />+ IPATH_LINK_TOARMED | IPATH_LINK_SLEEPING);<br />+ *dd->ipath_statusp &= ~(IPATH_STATUS_IB_NOCABLE<br />+ | IPATH_STATUS_IB_READY);<br />+ /*<br />+ * set the externally visible LEDs to indicate<br />+ * state (same as 0x11)<br />+ */<br />+ dd->ipath_extctrl &=<br />+ ~INFINIPATH_EXTC_LEDPRIPORTYELLOWON;<br />+ dd->ipath_extctrl |= INFINIPATH_EXTC_LEDPRIPORTGREENON;<br />+ ipath_kput_kreg(t, kr_extctrl,<br />+ (uint64_t) dd->ipath_extctrl);<br />+ if (dd->ipath_flags & IPATH_LINK_TOACTIVE) {<br />+ /*<br />+ * if we got here while trying to bring<br />+ * the link up, try again, but only once more!<br />+ */<br />+ ipath_set_ib_lstate(t,<br />+ INFINIPATH_IBCC_LINKCMD_ACTIVE);<br />+ dd->ipath_flags &= ~IPATH_LINK_TOACTIVE;<br />+ }<br />+ } else {<br />+ if (dd-><br />+ ipath_flags & (IPATH_LINK_TOARMED |<br />+ IPATH_LINK_TOACTIVE))<br />+ ipath_set_ib_lstate(t,<br />+ INFINIPATH_IBCC_LINKCMD_INIT);<br />+ else if (!noprint)<br />+ _IPATH_DBG("IBstatuschange unit %u: %s\n",<br />+ t, ipath_ibcstatus_str[ltstate]);<br />+ }<br />+ dd->ipath_lastibcstat = val;<br />+ }<br />+<br />+skip_ibchange:<br />+<br />+ if (errs & INFINIPATH_E_RESET) {<br />+ if (!noprint)<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Got reset, requires re-initialization (unload and reload driver)\n");<br />+ dd->ipath_flags &= ~IPATH_INITTED; /* needs re-init */<br />+ /* mark as having had error */<br />+ *dd->ipath_statusp |= IPATH_STATUS_HWERROR;<br />+ *dd->ipath_statusp &= ~IPATH_STATUS_IB_CONF;<br />+ }<br />+<br />+ if (!noprint && *msg)<br />+ _IPATH_UNIT_ERROR(t, "%s error\n", msg);<br />+ if (dd->ipath_sma_state_wanted & dd->ipath_flags) {<br />+ _IPATH_VDBG("sma wanted state %x, iflags now %x, waking\n",<br />+ dd->ipath_sma_state_wanted, dd->ipath_flags);<br />+ wake_up_interruptible(&ipath_sma_state_wait);<br />+ }<br />+<br />+ if (chkerrpkts)<br />+ /* process possible error packets in hdrq */<br />+ ipath_kreceive(t);<br />+}<br />+<br />+/* must only be called if ipath_pd[port] is known to be allocated */<br />+static __inline__ void *ipath_get_egrbuf(const ipath_type t, uint32_t bufnum,<br />+ int err)<br />+{<br />+ return devdata[t].ipath_port0_skbs ?<br />+ (void *)devdata[t].ipath_port0_skbs[bufnum]->data : NULL;<br />+<br />+#ifdef _USE_FOR_DEBUGGING_ONLY<br />+ /*<br />+ * want routine to be inlined and fast this is here so if we do ports<br />+ * other than 0, I don't have to rewrite the code, since it's slightly<br />+ * complicated<br />+ */<br />+ if (port != 1) {<br />+ void *chunkbase;<br />+ /*<br />+ * This calculation takes about 50 cycles. Could do<br />+ * what I did for protocol code, and have an array of<br />+ * addresses, getting it down to just a few cycles per<br />+ * lookup, at the cost of 16KB of memory.<br />+ */<br />+ if (!devdata[t].ipath_pd[port]->port_rcvegrbuf_virt)<br />+ return NULL;<br />+ chunkbase = devdata[t].ipath_pd[port]->port_rcvegrbuf_virt<br />+ [bufnum /<br />+ devdata[t].ipath_pd[port]->port_rcvegrbufs_perchunk];<br />+ return (void *)(chunkbase +<br />+ (bufnum %<br />+ devdata[t].ipath_pd[port]-><br />+ port_rcvegrbufs_perchunk)<br />+ * devdata[t].ipath_rcvegrbufsize);<br />+ }<br />+#endif<br />+}<br />+<br />+/* receive an sma packet. Separate for better overall optimization */<br />+static void ipath_rcv_sma(const ipath_type t, uint32_t tlen,<br />+ uint64_t * rc, void *ebuf)<br />+{<br />+ int sindex, slen, elen;<br />+ void *smbuf;<br />+ uint8_t pad, *bthbytes;<br />+<br />+ ipath_stats.sps_sma_rpkts++; /* another SMA packet received */<br />+<br />+ bthbytes = (uint8_t *) ((ips_message_header_typ *) & rc[1])->bth;<br />+<br />+ pad = (bthbytes[1] >> 4) & 3;<br />+ elen = tlen - (IPATH_SMA_HDRSZ + pad + (uint32_t) sizeof(uint32_t));<br />+ if (elen > (SMA_MAX_PKTSZ - IPATH_SMA_HDRSZ))<br />+ elen = SMA_MAX_PKTSZ - IPATH_SMA_HDRSZ;<br />+<br />+ spin_lock_irq(&ipath_sma_lock);<br />+ sindex = ipath_sma_next;<br />+ smbuf = ipath_sma_data[sindex].buf;<br />+ ipath_sma_data[sindex].unit = t;<br />+ slen = ipath_sma_data[ipath_sma_next].len;<br />+ memcpy(smbuf, &rc[1], IPATH_SMA_HDRSZ);<br />+ memcpy(smbuf + IPATH_SMA_HDRSZ, ebuf, elen);<br />+ if (slen) {<br />+ /*<br />+ * overwriting a yet unread old one (buffer wrap), have to<br />+ * advance ipath_sma_first to next oldest<br />+ */<br />+<br />+ /* count OK packets that we drop */<br />+ ipath_stats.sps_krdrops++;<br />+ if (++ipath_sma_first >= IPATH_NUM_SMAPKTS)<br />+ ipath_sma_first = 0;<br />+ }<br />+ slen = ipath_sma_data[sindex].len = elen + IPATH_SMA_HDRSZ;<br />+ if (++ipath_sma_next >= IPATH_NUM_SMAPKTS)<br />+ ipath_sma_next = 0;<br />+ spin_unlock_irq(&ipath_sma_lock);<br />+}<br />+<br />+/*<br />+ * receive a packet for the layered (ethernet) driver.<br />+ * Separate routine for better overall optimization<br />+ */<br />+static void ipath_rcv_layer(const ipath_type t, uint32_t etail,<br />+ uint32_t tlen, ether_header_typ * hdr)<br />+{<br />+ uint32_t elen;<br />+ uint8_t pad, *bthbytes;<br />+ struct sk_buff *skb;<br />+ struct sk_buff *nskb;<br />+ ipath_devdata *dd = &devdata[t];<br />+ ipath_portdata *pd;<br />+ unsigned long pa, pent;<br />+ uint64_t *egrbase;<br />+ uint64_t lenvalid; /* in words */<br />+<br />+ if (dd->ipath_port0_skbs && hdr->sub_opcode == OPCODE_ENCAP) {<br />+ /*<br />+ * Allocate a new sk_buff to replace the one we give<br />+ * to the network stack.<br />+ */<br />+ if (!(nskb = dev_alloc_skb(dd->ipath_ibmaxlen + 4))) {<br />+ /* count OK packets that we drop */<br />+ ipath_stats.sps_krdrops++;<br />+ return;<br />+ }<br />+<br />+ bthbytes = (uint8_t *) hdr->bth;<br />+ pad = (bthbytes[1] >> 4) & 3;<br />+ /* +CRC32 */<br />+ elen = tlen - (sizeof(*hdr) + pad + sizeof(uint32_t));<br />+<br />+ skb_reserve(nskb, 4);<br />+<br />+ skb = dd->ipath_port0_skbs[etail];<br />+ dd->ipath_port0_skbs[etail] = nskb;<br />+ skb_put(skb, elen);<br />+<br />+ pd = dd->ipath_pd[0];<br />+ lenvalid = (dd->ipath_ibmaxlen - pd->port_egrskip) >> 2;<br />+ lenvalid <<= INFINIPATH_RT_BUFSIZE_SHIFT;<br />+ lenvalid |= INFINIPATH_RT_VALID;<br />+ pa = virt_to_phys(nskb->data);<br />+ pa += pd->port_egrskip;<br />+ pent = (pa & INFINIPATH_RT_ADDR_MASK) | lenvalid;<br />+ /* This is simplified for port 0 */<br />+ egrbase = (uint64_t *) ((char *)(dd->ipath_kregbase) +<br />+ dd->ipath_rcvegrbase);<br />+ ipath_kput_memq(t, &egrbase[etail], pent);<br />+<br />+ dd->ipath_layer.l_rcv(t, hdr, skb);<br />+<br />+ /* another ether packet received */<br />+ ipath_stats.sps_ether_rpkts++;<br />+ } else if (hdr->sub_opcode == OPCODE_LID_ARP) {<br />+ if (dd->ipath_layer.l_rcv_lid)<br />+ dd->ipath_layer.l_rcv_lid(t, hdr);<br />+ }<br />+<br />+}<br />+<br />+/* called from interrupt handler for errors or receive interrupt */<br />+void ipath_kreceive(const ipath_type t)<br />+{<br />+ uint64_t *rc;<br />+ void *ebuf;<br />+ ipath_devdata *dd = &devdata[t];<br />+ const uint32_t rsize = dd->ipath_rcvhdrentsize; /* words */<br />+ const uint32_t maxcnt = dd->ipath_rcvhdrcnt * rsize; /* in words */<br />+ uint32_t etail = ~0U, l, hdrqtail, sma_this_time = 0;<br />+ ips_message_header_typ *hdr;<br />+ uint32_t eflags, i, etype, tlen, pkttot=0;<br />+ static uint64_t totcalls; /* stats, may eventually remove */<br />+ char emsg[128];<br />+<br />+ if (!dd->ipath_hdrqtailptr) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "hdrqtailptr not set, can't do receives\n");<br />+ return;<br />+ }<br />+<br />+ if (test_and_set_bit(0, &dd->ipath_rcv_pending)) {<br />+ /* There is already a thread processing this queue. */<br />+ return;<br />+ }<br />+<br />+ if (dd->ipath_port0head == *dd->ipath_hdrqtailptr)<br />+ goto done;<br />+<br />+gotmore:<br />+ /*<br />+ * read only once at start. If in flood situation, this helps<br />+ * performance slightly. If more arrive while we are processing,<br />+ * we'll come back here and do them<br />+ */<br />+ hdrqtail = *dd->ipath_hdrqtailptr;<br />+<br />+ for (i = 0, l = dd->ipath_port0head; l != hdrqtail; i++) {<br />+ uint32_t qp;<br />+ uint8_t *bthbytes;<br />+<br />+<br />+ rc = (uint64_t *) (dd->ipath_pd[0]->port_rcvhdrq + (l << 2));<br />+ hdr = (ips_message_header_typ *) & rc[1];<br />+ /*<br />+ * could make a network order version of IPATH_KD_QP, and<br />+ * do the obvious shift before masking to speed this up.<br />+ */<br />+ qp = ntohl(hdr->bth[1]) & 0xffffff;<br />+ bthbytes = (uint8_t *) hdr->bth;<br />+<br />+ eflags = ips_get_hdr_err_flags(rc);<br />+ etype = ips_get_rcv_type(rc);<br />+ tlen = ips_get_length_in_bytes(rc); /* total length */<br />+ ebuf = NULL;<br />+ if (etype != RCVHQ_RCV_TYPE_EXPECTED) {<br />+ /*<br />+ * it turns out that the chips uses an eager buffer for<br />+ * all non-expected packets, whether it "needs"<br />+ * one or not. So always get the index, but<br />+ * don't set ebuf (so we try to copy data)<br />+ * unless the length requires it.<br />+ */<br />+ etail = ips_get_index(rc);<br />+ if (tlen > sizeof(*hdr)<br />+ || etype == RCVHQ_RCV_TYPE_NON_KD) {<br />+ ebuf = ipath_get_egrbuf(t, etail, 0);<br />+ }<br />+ }<br />+<br />+ /*<br />+ * both tiderr and ipathhdrerr are set for all plain IB<br />+ * packets; only ipathhdrerr should be set.<br />+ */<br />+<br />+ if (etype != RCVHQ_RCV_TYPE_NON_KD<br />+ && etype != RCVHQ_RCV_TYPE_ERROR<br />+ && ips_get_ipath_ver(hdr->iph.ver_port_tid_offset) !=<br />+ IPS_PROTO_VERSION) {<br />+ _IPATH_PDBG("Bad InfiniPath protocol version %x\n",<br />+ etype);<br />+ }<br />+<br />+ if (eflags &<br />+ ~(INFINIPATH_RHF_H_TIDERR | INFINIPATH_RHF_H_IHDRERR)) {<br />+ get_rhf_errstring(eflags, emsg, sizeof emsg);<br />+ _IPATH_PDBG<br />+ ("RHFerrs %x hdrqtail=%x typ=%u tlen=%x opcode=%x egridx=%x: %s\n",<br />+ eflags, l, etype, tlen, bthbytes[0],<br />+ ips_get_index(rc), emsg);<br />+ } else if (etype == RCVHQ_RCV_TYPE_NON_KD) {<br />+ /*<br />+ * If there is a userland SMA and this is a MAD packet,<br />+ * then pass it to the userland SMA.<br />+ */<br />+ if (ipath_sma_alive && qp <= 1) {<br />+ /*<br />+ * count OK packets that we drop because<br />+ * SMA isn't yet running, or because we<br />+ * are in an sma flood (no point in<br />+ * constantly acquiring the spin lock, and<br />+ * overwriting previous packets).<br />+ * Eventually things will recover.<br />+ * Similarly if the sma consumer is<br />+ * so far behind that we would overwrite<br />+ * (yes, it's outside the lock)<br />+ */<br />+ if (!ipath_sma_data_spare ||<br />+ ipath_sma_data[ipath_sma_next].len ||<br />+ ++sma_this_time > IPATH_NUM_SMAPKTS) {<br />+ ipath_stats.sps_krdrops++;<br />+ } else if (ebuf) {<br />+ ipath_rcv_sma(t, tlen, rc, ebuf);<br />+ }<br />+ } else if (dd->verbs_layer.l_rcv) {<br />+ dd->verbs_layer.l_rcv(t, rc + 1, ebuf, tlen);<br />+ } else {<br />+ _IPATH_VDBG("received IB packet, not SMA (QP=%x)\n",<br />+ qp);<br />+ }<br />+ } else if (etype == RCVHQ_RCV_TYPE_EAGER) {<br />+ if (qp == IPATH_KD_QP && bthbytes[0] ==<br />+ dd->ipath_layer.l_rcv_opcode && ebuf)<br />+ ipath_rcv_layer(t, etail, tlen,<br />+ (ether_header_typ *) hdr);<br />+ else<br />+ _IPATH_PDBG<br />+ ("typ %x, opcode %x (eager, qp=%x), len %x; ignored\n",<br />+ etype, bthbytes[0], qp, tlen);<br />+ } else if (etype == RCVHQ_RCV_TYPE_EXPECTED) {<br />+ _IPATH_DBG("Bug: Expected TID, opcode %x; ignored\n",<br />+ hdr->bth[0] & 0xff);<br />+ } else if (eflags &<br />+ (INFINIPATH_RHF_H_TIDERR | INFINIPATH_RHF_H_IHDRERR))<br />+ {<br />+ /*<br />+ * This is a type 3 packet, only the LRH is in<br />+ * the rcvhdrq, the rest of the header is in<br />+ * the eager buffer.<br />+ */<br />+ uint8_t opcode;<br />+ if (ebuf) {<br />+ bthbytes = (uint8_t *) ebuf;<br />+ opcode = *bthbytes;<br />+ } else<br />+ opcode = 0;<br />+ get_rhf_errstring(eflags, emsg, sizeof emsg);<br />+ _IPATH_DBG<br />+ ("Err %x (%s), opcode %x, egrbuf %x, len %x\n",<br />+ eflags, emsg, opcode, etail, tlen);<br />+ } else {<br />+ /*<br />+ * error packet, type of error unknown.<br />+ * Probably type 3, but we don't know, so don't<br />+ * even try to print the opcode, etc.<br />+ */<br />+ _IPATH_DBG<br />+ ("Error Pkt, but no eflags! egrbuf %x, len %x\n"<br />+ "hdrq@%lx;hdrq+%x rhf: %llx; hdr %llx %llx %llx %llx %llx\n",<br />+ etail, tlen, (unsigned long)rc, l, rc[0], rc[1],<br />+ rc[2], rc[3], rc[4], rc[5]);<br />+ }<br />+ l += rsize;<br />+ if (l >= maxcnt)<br />+ l = 0;<br />+ /*<br />+ * update for each packet, to help prevent overflows if we have<br />+ * lots of packets.<br />+ */<br />+ (void)ipath_kput_ureg(t, ur_rcvhdrhead, l, 0);<br />+ if (etype != RCVHQ_RCV_TYPE_EXPECTED)<br />+ (void)ipath_kput_ureg(t, ur_rcvegrindexhead, etail, 0);<br />+ }<br />+<br />+ pkttot += i;<br />+<br />+ dd->ipath_port0head = l;<br />+<br />+ if (hdrqtail != *dd->ipath_hdrqtailptr)<br />+ goto gotmore; /* more arrived while we handled first batch */<br />+<br />+ if(pkttot > ipath_stats.sps_maxpkts_call)<br />+ ipath_stats.sps_maxpkts_call = pkttot;<br />+ ipath_stats.sps_port0pkts += pkttot;<br />+ ipath_stats.sps_avgpkts_call = ipath_stats.sps_port0pkts / ++totcalls;<br />+<br />+ if (sma_this_time) /* only once at end, not each time */<br />+ wake_up_interruptible(&ipath_sma_wait);<br />+<br />+done:<br />+ clear_bit(0, &dd->ipath_rcv_pending);<br />+ smp_mb__after_clear_bit();<br />+}<br />+<br />+/*<br />+ * Update our shadow copy of the PIO availability register map, called<br />+ * whenever our local copy indicates we have run out of send buffers<br />+ * NOTE: This can be called from interrupt context by ipath_bufavail()<br />+ * and from non-interrupt context by ipath_getpiobuf().<br />+ */<br />+<br />+static void ipath_update_pio_bufs(const ipath_type t)<br />+{<br />+ unsigned long flags;<br />+ int i;<br />+ const unsigned piobregs = (unsigned)devdata[t].ipath_pioavregs;<br />+<br />+ /* If the generation (check) bits have changed, then we update the<br />+ * busy bit for the corresponding PIO buffer. This algorithm will<br />+ * modify positions to the value they already have in some cases<br />+ * (i.e., no change), but it's faster than changing only the bits<br />+ * that have changed.<br />+ *<br />+ * We would like to do this atomicly, to avoid spinlocks in the<br />+ * critical send path, but that's not really possible, given the<br />+ * type of changes, and that this routine could be called on multiple<br />+ * cpu's simultaneously, so we lock in this routine only, to avoid<br />+ * conflicting updates; all we change is the shadow, and it's a<br />+ * single 64 bit memory location, so by definition the update is<br />+ * atomic in terms of what other cpu's can see in testing the<br />+ * bits. The spin_lock overhead isn't too bad, since it only<br />+ * happens when all buffers are in use, so only cpu overhead,<br />+ * not latency or bandwidth is affected.<br />+ */<br />+#define _IPATH_ALL_CHECKBITS 0x5555555555555555ULL<br />+ if (!devdata[t].ipath_pioavailregs_dma) {<br />+ _IPATH_DBG("Update shadow pioavail, but regs_dma NULL!\n");<br />+ return;<br />+ }<br />+ if (infinipath_debug & __IPATH_VERBDBG) {<br />+ /* only if packet debug and verbose */<br />+ _IPATH_PDBG("Refill avail, dma0=%llx shad0=%llx, "<br />+ "d1=%llx s1=%llx, d2=%llx s2=%llx, d3=%llx s3=%llx\n",<br />+ devdata[t].ipath_pioavailregs_dma[0],<br />+ devdata[t].ipath_pioavailshadow[0],<br />+ devdata[t].ipath_pioavailregs_dma[1],<br />+ devdata[t].ipath_pioavailshadow[1],<br />+ devdata[t].ipath_pioavailregs_dma[2],<br />+ devdata[t].ipath_pioavailshadow[2],<br />+ devdata[t].ipath_pioavailregs_dma[3],<br />+ devdata[t].ipath_pioavailshadow[3]);<br />+ if (piobregs > 4)<br />+ _IPATH_PDBG("2nd group, dma4=%llx shad4=%llx, "<br />+ "d5=%llx s5=%llx, d6=%llx s6=%llx, d7=%llx s7=%llx\n",<br />+ devdata[t].ipath_pioavailregs_dma[4],<br />+ devdata[t].ipath_pioavailshadow[4],<br />+ devdata[t].ipath_pioavailregs_dma[5],<br />+ devdata[t].ipath_pioavailshadow[5],<br />+ devdata[t].ipath_pioavailregs_dma[6],<br />+ devdata[t].ipath_pioavailshadow[6],<br />+ devdata[t].ipath_pioavailregs_dma[7],<br />+ devdata[t].ipath_pioavailshadow[7]);<br />+ }<br />+ spin_lock_irqsave(&ipath_pioavail_lock, flags);<br />+ for (i = 0; i < piobregs; i++) {<br />+ uint64_t pchbusy, pchg, piov, pnew;<br />+ /* Chip Errata: bug 6641; even and odd qwords>3 are swapped */<br />+ piov = devdata[t].ipath_pioavailregs_dma[i > 3 ? i ^ 1 : i];<br />+ pchg =<br />+ _IPATH_ALL_CHECKBITS & ~(devdata[t].<br />+ ipath_pioavailshadow[i] ^ piov);<br />+ pchbusy = pchg << INFINIPATH_SENDPIOAVAIL_BUSY_SHIFT;<br />+ if (pchg && (pchbusy & devdata[t].ipath_pioavailshadow[i])) {<br />+ pnew = devdata[t].ipath_pioavailshadow[i] & ~pchbusy;<br />+ pnew |= piov & pchbusy;<br />+ devdata[t].ipath_pioavailshadow[i] = pnew;<br />+ }<br />+ }<br />+ spin_unlock_irqrestore(&ipath_pioavail_lock, flags);<br />+}<br />+<br />+static int ipath_do_user_init(ipath_portdata * pd,<br />+ struct ipath_user_info *uinfo)<br />+{<br />+ int ret = 0;<br />+ ipath_type t = pd->port_unit;<br />+ ipath_devdata *dd = &devdata[t];<br />+ struct ipath_user_info kinfo;<br />+<br />+ if (copy_from_user(&kinfo, uinfo, sizeof kinfo))<br />+ ret = -EFAULT;<br />+ else {<br />+ /* for now, if major version is different, bail */<br />+ if ((kinfo.spu_userversion >> 16) != IPATH_USER_SWMAJOR) {<br />+ _IPATH_INFO<br />+ ("User major version %d not same as driver major %d\n",<br />+ kinfo.spu_userversion >> 16, IPATH_USER_SWMAJOR);<br />+ ret = -ENODEV;<br />+ } else {<br />+ if ((kinfo.spu_userversion & 0xffff) !=<br />+ IPATH_USER_SWMINOR)<br />+ _IPATH_DBG<br />+ ("User minor version %d not same as driver minor %d\n",<br />+ kinfo.spu_userversion & 0xffff,<br />+ IPATH_USER_SWMINOR);<br />+ if (kinfo.spu_rcvhdrsize) {<br />+ if ((ret =<br />+ ipath_setrcvhdrsize(t,<br />+ kinfo.spu_rcvhdrsize)))<br />+ goto done;<br />+ } else if (!dd->ipath_rcvhdrsize) {<br />+ /*<br />+ * first user of field, kernel or user<br />+ * code, and using default<br />+ */<br />+ dd->ipath_rcvhdrsize = IPATH_DFLT_RCVHDRSIZE;<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvhdrsize,<br />+ dd->ipath_rcvhdrsize);<br />+ _IPATH_VDBG<br />+ ("Use default protocol header size %u\n",<br />+ dd->ipath_rcvhdrsize);<br />+ }<br />+<br />+ pd->port_egrskip = kinfo.spu_egrskip;<br />+ if (pd->port_egrskip) {<br />+ if (pd->port_egrskip & 3) {<br />+ _IPATH_DBG<br />+ ("eager skip 0x%x invalid, must be word multiple; using 0x%x\n",<br />+ pd->port_egrskip,<br />+ pd->port_egrskip & ~3);<br />+ pd->port_egrskip &= ~3;<br />+ }<br />+ _IPATH_DBG<br />+ ("user reserves 0x%x bytes at start of eager TIDs\n",<br />+ pd->port_egrskip);<br />+ }<br />+<br />+ /*<br />+ * for now we do nothing with rcvhdrcnt:<br />+ * kinfo.spu_rcvhdrcnt<br />+ */<br />+<br />+ /*<br />+ * set up for the rcvhdr Q tail register writeback<br />+ * to user memory<br />+ */<br />+ if (kinfo.spu_rcvhdraddr &&<br />+ access_ok(VERIFY_WRITE, kinfo.spu_rcvhdraddr,<br />+ sizeof(uint64_t))) {<br />+ uint64_t physaddr, uaddr, off, atmp;<br />+ struct page *pagep;<br />+ off = offset_in_page(kinfo.spu_rcvhdraddr);<br />+ uaddr =<br />+ PAGE_MASK & (unsigned long)kinfo.<br />+ spu_rcvhdraddr;<br />+ if ((ret = ipath_mlock_nocopy(uaddr, &pagep))) {<br />+ _IPATH_INFO<br />+ ("Failed to lookup and lock address %llx for rcvhdrtail: errno %d\n",<br />+ kinfo.spu_rcvhdraddr, -ret);<br />+ goto done;<br />+ }<br />+ ipath_stats.sps_pagelocks++;<br />+ pd->port_rcvhdrtail_uaddr = uaddr;<br />+ pd->port_rcvhdrtail_pagep = pagep;<br />+ pd->port_rcvhdrtail_kvaddr =<br />+ page_address(pagep);<br />+ pd->port_rcvhdrtail_kvaddr += off;<br />+ physaddr = page_to_phys(pagep) + off;<br />+ _IPATH_VDBG<br />+ ("port %d user addr %llx hdrtailaddr, %llx physical (off=%llx)\n",<br />+ pd->port_port, kinfo.spu_rcvhdraddr,<br />+ physaddr, off);<br />+ ipath_kput_kreg_port(t, kr_rcvhdrtailaddr,<br />+ pd->port_port, physaddr);<br />+ atmp =<br />+ ipath_kget_kreg64_port(t, kr_rcvhdrtailaddr,<br />+ pd->port_port);<br />+ if (physaddr != atmp) {<br />+ _IPATH_UNIT_ERROR(t,<br />+ "Catastrophic software error, RcvHdrTailAddr%u written as %llx, read back as %llx\n",<br />+ pd->port_port,<br />+ physaddr, atmp);<br />+ ret = -EINVAL;<br />+ goto done;<br />+ }<br />+ } else {<br />+ _IPATH_DBG<br />+ ("Port %d rcvhdrtail addr %llx not valid\n",<br />+ pd->port_port, kinfo.spu_rcvhdraddr);<br />+ ret = -EINVAL;<br />+ goto done;<br />+ }<br />+<br />+ /*<br />+ * for right now, kernel piobufs are at end,<br />+ * so port 1 is at 0<br />+ */<br />+ pd->port_piobufs = dd->ipath_piobufbase +<br />+ dd->ipath_pbufsport * (pd->port_port -<br />+ 1) * dd->ipath_palign;<br />+ _IPATH_VDBG("Set base of piobufs for port %u to 0x%x\n",<br />+ pd->port_port, pd->port_piobufs);<br />+<br />+ /*<br />+ * Now allocate the rcvhdr Q and eager TIDs;<br />+ * skip the TID array for time being.<br />+ * If pd->port_port > chip-supported, we need<br />+ * to do extra stuff here to handle by handling<br />+ * overflow through port 0, someday<br />+ */<br />+ if (!(ret = ipath_create_rcvhdrq(pd)))<br />+ ret = ipath_create_user_egr(pd);<br />+ if (!ret) { /* enable receives now */<br />+ uint64_t head;<br />+ uint32_t head32;<br />+ /* atomically set enable bit for this port */<br />+ atomic_set_mask(1U <<<br />+ (INFINIPATH_R_PORTENABLE_SHIFT +<br />+ pd->port_port),<br />+ &dd->ipath_rcvctrl);<br />+<br />+ /*<br />+ * set the head registers for this port<br />+ * to the current values of the tail<br />+ * pointers, since we don't know if they<br />+ * were updated on last use of the port.<br />+ */<br />+ head32 =<br />+ ipath_kget_ureg32(t, ur_rcvhdrtail,<br />+ pd->port_port);<br />+ head = (uint64_t) head32;<br />+ ipath_kput_ureg(t, ur_rcvhdrhead, head,<br />+ pd->port_port);<br />+ head32 =<br />+ ipath_kget_ureg32(t, ur_rcvegrindextail,<br />+ pd->port_port);<br />+ ipath_kput_ureg(t, ur_rcvegrindexhead, head32,<br />+ pd->port_port);<br />+ dd->ipath_lastegrheads[pd->port_port] = ~0;<br />+ dd->ipath_lastrcvhdrqtails[pd->port_port] = ~0;<br />+ _IPATH_VDBG<br />+ ("Wrote port%d head %llx, egrhead %x from tail regs\n",<br />+ pd->port_port, head, head32);<br />+ /* start at beginning after open */<br />+ pd->port_tidcursor = 0;<br />+ {<br />+ /*<br />+ * now enable the port; the tail<br />+ * registers will be written to<br />+ * memory by the chip as soon<br />+ * as it sees the write to<br />+ * kr_rcvctrl. The update only<br />+ * happens on transition from 0<br />+ * to 1, so clear it first, then<br />+ * set it as part of enabling<br />+ * the port. This will (very<br />+ * briefly) affect any other open<br />+ * ports, but it shouldn't be long<br />+ * enough to be an issue.<br />+ */<br />+ ipath_kput_kreg(t, kr_rcvctrl,<br />+ dd-><br />+ ipath_rcvctrl &<br />+ ~INFINIPATH_R_TAILUPD);<br />+ ipath_kput_kreg(t, kr_rcvctrl,<br />+ dd->ipath_rcvctrl);<br />+ }<br />+ }<br />+ }<br />+ }<br />+<br />+done:<br />+ return ret;<br />+}<br />+<br />+static int ipath_get_baseinfo(ipath_portdata * pd,<br />+ struct ipath_base_info *ubase)<br />+{<br />+ int ret = 0;<br />+ struct ipath_base_info kbase;<br />+ ipath_devdata *dd = &devdata[pd->port_unit];<br />+<br />+ /* be sure anything we don't set is 0ed */<br />+ memset(&kbase, 0, sizeof kbase);<br />+ kbase.spi_rcvhdr_cnt = dd->ipath_rcvhdrcnt;<br />+ kbase.spi_rcvhdrent_size = dd->ipath_rcvhdrentsize;<br />+ kbase.spi_tidegrcnt = dd->ipath_rcvegrcnt;<br />+ kbase.spi_rcv_egrbufsize = dd->ipath_rcvegrbufsize;<br />+ kbase.spi_rcv_egrbuftotlen = pd->port_rcvegrbuf_chunks * PAGE_SIZE * (1 << pd->port_rcvegrbuf_order); /* have to mmap whole thing */<br />+ kbase.spi_rcv_egrperchunk = pd->port_rcvegrbufs_perchunk;<br />+ kbase.spi_rcv_egrchunksize = kbase.spi_rcv_egrbuftotlen /<br />+ pd->port_rcvegrbuf_chunks;<br />+ kbase.spi_tidcnt = dd->ipath_rcvtidcnt;<br />+ /*<br />+ * for this use, may be ipath_cfgports summed over all chips that<br />+ * are are configured and present<br />+ */<br />+ kbase.spi_nports = dd->ipath_cfgports;<br />+ kbase.spi_unit = pd->port_unit; /* unit (chip/board) our port is on */<br />+ /* for now, only a single page */<br />+ kbase.spi_tid_maxsize = PAGE_SIZE;<br />+<br />+ /*<br />+ * doing this per port, and based on the skip value, etc.<br />+ * This has to be the actual buffer size, since the protocol<br />+ * code treats it as an array.<br />+ *<br />+ * These have to be set to user addresses in the user code via mmap<br />+ * These values are used on return to user code for the mmap target<br />+ * addresses only. For 32 bit, same 44 bit address problem, so use<br />+ * the physical address, not virtual. Before 2.6.11, using the<br />+ * page_address() macro worked, but in 2.6.11, even that returns<br />+ * the full 64 bit address (upper bits all 1's).<br />+ * So far, using the physical addresses (or chip offsets, for<br />+ * chip mapping) works, but no doubt some future kernel release<br />+ * will chang that, and we'll be on to yet another method of<br />+ * dealing with this<br />+ */<br />+ kbase.spi_rcvhdr_base = (uint64_t) pd->port_rcvhdrq_phys;<br />+ kbase.spi_rcv_egrbufs = (uint64_t) pd->port_rcvegr_phys;<br />+ kbase.spi_pioavailaddr = (uint64_t) dd->ipath_pioavailregs_phys;<br />+ kbase.spi_status = (uint64_t) kbase.spi_pioavailaddr +<br />+ (void *)dd->ipath_statusp - (void *)dd->ipath_pioavailregs_dma;<br />+ kbase.spi_piobufbase = (uint64_t) pd->port_piobufs;<br />+ kbase.__spi_uregbase =<br />+ dd->ipath_uregbase + dd->ipath_palign * pd->port_port;<br />+<br />+ kbase.spi_pioindex = dd->ipath_pbufsport * (pd->port_port - 1);<br />+ kbase.spi_piocnt = dd->ipath_pbufsport;<br />+ kbase.spi_pioalign = dd->ipath_palign;<br />+<br />+ kbase.spi_qpair = IPATH_KD_QP;<br />+ kbase.spi_piosize = dd->ipath_ibmaxlen;<br />+ kbase.spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */<br />+ kbase.spi_port = pd->port_port;<br />+ kbase.spi_sw_version = IPATH_KERN_SWVERSION;<br />+ kbase.spi_hw_version = dd->ipath_revision;<br />+<br />+ if (copy_to_user(ubase, &kbase, sizeof kbase))<br />+ ret = -EFAULT;<br />+<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * return number of units supported by driver. This is infinipath_max,<br />+ * unless there are no initted units.<br />+ */<br />+static int ipath_get_units(void)<br />+{<br />+ int i;<br />+<br />+ for (i = 0; i < infinipath_max; i++)<br />+ if (devdata[i].ipath_flags & IPATH_INITTED)<br />+ return infinipath_max;<br />+ return 0;<br />+}<br />+<br />+/* write data to the EEPROM on the board */<br />+static int ipath_wr_eeprom(ipath_portdata * pd, struct ipath_eeprom_req *req)<br />+{<br />+ int ret = 0;<br />+ struct ipath_eeprom_req kreq;<br />+ void *buf = NULL;<br />+<br />+ if (!capable(CAP_SYS_ADMIN))<br />+ return -EPERM; /* not just any old user can write flash */<br />+ if (copy_from_user(&kreq, req, sizeof kreq))<br />+ return -EFAULT;<br />+ if (!kreq.addr || (kreq.offset + kreq.len) > 128) {<br />+ _IPATH_DBG<br />+ ("called with NULL addr %llx, or bad cnt %u or offset %u\n",<br />+ kreq.addr, kreq.len, kreq.offset);<br />+ return -EINVAL;<br />+ }<br />+<br />+ if (!(buf = vmalloc(kreq.len))) {<br />+ ret = -ENOMEM;<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "Couldn't allocate memory to write %u bytes from eeprom\n",<br />+ kreq.len);<br />+ goto done;<br />+ }<br />+ if (copy_from_user(buf, (void *)kreq.addr, kreq.len)) {<br />+ ret = -EFAULT;<br />+ goto done;<br />+ }<br />+ if (ipath_eeprom_write(pd->port_unit, kreq.offset, buf, kreq.len)) {<br />+ ret = -ENXIO;<br />+ _IPATH_UNIT_ERROR(pd->port_unit,<br />+ "Failed write to eeprom %u bytes offset %u\n",<br />+ kreq.len, kreq.offset);<br />+ }<br />+<br />+done:<br />+ if (buf)<br />+ vfree(buf);<br />+ return ret;<br />+}<br />+<br />+/* read data from the EEPROM on the board */<br />+int ipath_rd_eeprom(const ipath_type port_unit, struct ipath_eeprom_req *req)<br />+{<br />+ int ret = 0;<br />+ struct ipath_eeprom_req kreq;<br />+ void *buf = NULL;<br />+<br />+ if (copy_from_user(&kreq, req, sizeof kreq))<br />+ return -EFAULT;<br />+ if (!kreq.addr || (kreq.offset + kreq.len) > 128) {<br />+ _IPATH_DBG<br />+ ("called with NULL addr %llx, or bad cnt %u or offset %u\n",<br />+ kreq.addr, kreq.len, kreq.offset);<br />+ return -EINVAL;<br />+ }<br />+<br />+ if (!(buf = vmalloc(kreq.len))) {<br />+ ret = -ENOMEM;<br />+ _IPATH_UNIT_ERROR(port_unit,<br />+ "Couldn't allocate memory to read %u bytes from eeprom\n",<br />+ kreq.len);<br />+ goto done;<br />+ }<br />+ if (ipath_eeprom_read(port_unit, kreq.offset, buf, kreq.len)) {<br />+ ret = -ENXIO;<br />+ _IPATH_UNIT_ERROR(port_unit,<br />+ "Failed reading %u bytes offset %u from eeprom\n",<br />+ kreq.len, kreq.offset);<br />+ }<br />+ if (copy_to_user((void *)kreq.addr, buf, kreq.len))<br />+ ret = -EFAULT;<br />+<br />+done:<br />+ if (buf)<br />+ vfree(buf);<br />+ return ret;<br />+}<br />+<br />+/*<br />+ * wait for something to happen on a port. Currently this is <br />+ * PIO buffer available, or a packet being received. For now, at<br />+ * least, we wait no longer than 1/2 seconds on rcv, 1 tick on PIO, so<br />+ * we recover from any bugs (or, as we see in ips.c init and close, cases<br />+ * where other side isn't yet ready).<br />+ * NOTE: currently called only with PIO or RCV, never both, so path with both<br />+ * has not been tested<br />+ */<br />+static int ipath_wait_intr(ipath_portdata * pd, uint32_t flag)<br />+{<br />+ ipath_devdata *dd = &devdata[pd->port_unit];<br />+ /* stupid compiler can't tell it's initialized */<br />+ uint32_t im = 0;<br />+ uint32_t head, tail, timeo = 0, wflag = 0;<br />+<br />+ if (!(flag & (IPATH_WAIT_RCV | IPATH_WAIT_PIO)))<br />+ return -EINVAL;<br />+ if (flag & IPATH_WAIT_RCV) {<br />+ head = flag >> 16;<br />+ im = (1U << pd->port_port) << INFINIPATH_R_INTRAVAIL_SHIFT;<br />+ atomic_set_mask(im, &dd->ipath_rcvctrl);<br />+ /*<br />+ * now, before blocking, make sure that head is still == tail,<br />+ * reading from the chip, so we can be sure the interrupt enable<br />+ * has made it to the chip. If not equal, disable<br />+ * interrupt again and return immediately. This avoids<br />+ * races, and the overhead of the chip read doesn't<br />+ * matter much at this point, since we are waiting for<br />+ * something anyway.<br />+ */<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvctrl, dd->ipath_rcvctrl);<br />+ tail =<br />+ ipath_kget_ureg32(pd->port_unit, ur_rcvhdrtail,<br />+ pd->port_port);<br />+ if (tail == head) {<br />+ timeo = HZ / 2;<br />+ wflag = IPATH_PORT_WAITING_RCV;<br />+ } else {<br />+ atomic_clear_mask(im, &dd->ipath_rcvctrl);<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvctrl,<br />+ dd->ipath_rcvctrl);<br />+ }<br />+ }<br />+ if (flag & IPATH_WAIT_PIO) {<br />+ /*<br />+ * this one's a bit worse than the receive case, in that we<br />+ * can't really verify that at least one interrupt<br />+ * will happen...<br />+ * We do use a really short timeout, however<br />+ */<br />+ timeo = 1; /* if both, the short PIO timeout wins */<br />+ atomic_set_mask(1U << pd->port_port, &dd->ipath_portpiowait);<br />+ wflag |= IPATH_PORT_WAITING_PIO;<br />+ /*<br />+ * this has a possible race with the ipath stuff, so do<br />+ * it atomicly<br />+ */<br />+ atomic_set_mask(INFINIPATH_S_PIOINTBUFAVAIL,<br />+ &dd->ipath_sendctrl);<br />+ ipath_kput_kreg(pd->port_unit, kr_sendctrl, dd->ipath_sendctrl);<br />+ }<br />+ if (wflag) {<br />+ pd->port_flag |= wflag;<br />+ wait_event_interruptible_timeout(pd->port_wait,<br />+ (pd->port_flag & wflag) !=<br />+ wflag, timeo);<br />+ if (wflag & pd->port_flag & IPATH_PORT_WAITING_PIO) {<br />+ /* timed out, no PIO interrupts */<br />+ atomic_clear_mask(IPATH_PORT_WAITING_PIO,<br />+ &pd->port_flag);<br />+ pd->port_piowait_to++;<br />+ atomic_clear_mask(1U << pd->port_port,<br />+ &dd->ipath_portpiowait);<br />+ /*<br />+ * *don't* clear the pio interrupt enable;<br />+ * let that happen in the interrupt handler;<br />+ * else we have a race condition.<br />+ */<br />+ }<br />+ if (wflag & pd->port_flag & IPATH_PORT_WAITING_RCV) {<br />+ /* timed out, no packets received */<br />+ atomic_clear_mask(IPATH_PORT_WAITING_RCV,<br />+ &pd->port_flag);<br />+ pd->port_rcvwait_to++;<br />+ atomic_clear_mask(im, &dd->ipath_rcvctrl);<br />+ ipath_kput_kreg(pd->port_unit, kr_rcvctrl,<br />+ dd->ipath_rcvctrl);<br />+ }<br />+ } else {<br />+ /* else it's already happened, don't do wait_event overhead */<br />+ if (flag & IPATH_WAIT_RCV)<br />+ pd->port_rcvnowait++;<br />+ if (flag & IPATH_WAIT_PIO)<br />+ pd->port_pionowait++;<br />+ }<br />+ return 0;<br />+}<br />-- <br />0.99.9n<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-12-17 00:58 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>