CINXE.COM
LKML: "Paul E. McKenney": Re: [PATCH 10/13] [RFC] ipath verbs, part 1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: "Paul E. McKenney": Re: [PATCH 10/13] [RFC] ipath verbs, part 1</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by "Paul E. McKenney"" href="/groupie.php?aid=4246" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/12"> [Dec]</a> 聽 <a class="nb" href="/lkml/2005/12/18"> [18]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/12/18/89" onclick="this.href='/lkml/headers'+'/2005/12/18/89';">[headers]</a>聽 <a href="/lkml/bounce/2005/12/18/89">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/12/16/290">First message in thread</a></li><li><a href="/lkml/2005/12/16/305">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/299">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/295">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/302">Roland Dreier</a><ul><li><a href="/lkml/2005/12/16/292">Roland Dreier</a></li></ul></li></ul></li><li class="origin"><a href="/lkml/2005/12/18/92">"Paul E. McKenney"</a><ul><li><a href="/lkml/2005/12/18/92">Robert Walsh</a></li><li><a href="/lkml/2005/12/19/163">Ralph Campbell</a></li></ul></li></ul></li><li><a href="/lkml/2005/12/17/75">Andrew Morton</a><ul><li><a href="/lkml/2005/12/20/295">Robert Walsh</a></li></ul></li></ul></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Sun, 18 Dec 2005 11:59:22 -0800</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">"Paul E. McKenney" <></td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Re: [PATCH 10/13] [RFC] ipath verbs, part 1</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">On Fri, Dec 16, 2005 at 03:48:55PM -0800, Roland Dreier wrote:<br />> First half of ipath verbs driver<br /><br />Some RCU-related questions interspersed. Basic question is "where is<br />the lock-free read-side traversal?"<br /><br /> Thanx, Paul<br /><br />> ---<br />> <br />> drivers/infiniband/hw/ipath/ipath_verbs.c | 3244 +++++++++++++++++++++++++++++<br />> 1 files changed, 3244 insertions(+), 0 deletions(-)<br />> create mode 100644 drivers/infiniband/hw/ipath/ipath_verbs.c<br />> <br />> 72075ecec75f8c42e444a7d7d8ffcf340a845b96<br />> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c<br />> new file mode 100644<br />> index 0000000..808326e<br />> --- /dev/null<br />> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c<br />> @@ -0,0 +1,3244 @@<br />> +/*<br />> + * Copyright (c) 2005. PathScale, Inc. All rights reserved.<br />> + *<br />> + * This software is available to you under a choice of one of two<br />> + * licenses. You may choose to be licensed under the terms of the GNU<br />> + * General Public License (GPL) Version 2, available from the file<br />> + * COPYING in the main directory of this source tree, or the<br />> + * OpenIB.org BSD license below:<br />> + *<br />> + * Redistribution and use in source and binary forms, with or<br />> + * without modification, are permitted provided that the following<br />> + * conditions are met:<br />> + *<br />> + * - Redistributions of source code must retain the above<br />> + * copyright notice, this list of conditions and the following<br />> + * disclaimer.<br />> + *<br />> + * - Redistributions in binary form must reproduce the above<br />> + * copyright notice, this list of conditions and the following<br />> + * disclaimer in the documentation and/or other materials<br />> + * provided with the distribution.<br />> + *<br />> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,<br />> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF<br />> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND<br />> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS<br />> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN<br />> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN<br />> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE<br />> + * SOFTWARE.<br />> + *<br />> + * Patent licenses, if any, provided herein do not apply to<br />> + * combinations of this program with other software, or any other<br />> + * product whatsoever.<br />> + *<br />> + * $Id: ipath_verbs.c 4491 2005-12-15 22:20:31Z rjwalsh $<br />> + */<br />> +<br />> +#include <linux/config.h><br />> +#include <linux/version.h><br />> +#include <linux/pci.h><br />> +#include <linux/err.h><br />> +#include <rdma/ib_pack.h><br />> +#include <rdma/ib_smi.h><br />> +#include <rdma/ib_mad.h><br />> +#include <rdma/ib_user_verbs.h><br />> +<br />> +#include <asm/uaccess.h><br />> +#include <asm-generic/bug.h><br />> +<br />> +#include "ipath_common.h"<br />> +#include "ips_common.h"<br />> +#include "ipath_layer.h"<br />> +#include "ipath_verbs.h"<br />> +<br />> +/*<br />> + * Compare the lower 24 bits of the two values.<br />> + * Returns an integer <, ==, or > than zero.<br />> + */<br />> +static inline int cmp24(u32 a, u32 b)<br />> +{<br />> + return (((int) a) - ((int) b)) << 8;<br />> +}<br />> +<br />> +#define MODNAME "ib_ipath"<br />> +#define DRIVER_LOAD_MSG "PathScale " MODNAME " loaded: "<br />> +#define PFX MODNAME ": "<br />> +<br />> +<br />> +/* Not static, because we don't want the compiler removing it */<br />> +const char ipath_verbs_version[] = "ipath_verbs " _IPATH_IDSTR;<br />> +<br />> +unsigned int ib_ipath_qp_table_size = 251;<br />> +module_param(ib_ipath_qp_table_size, uint, 0444);<br />> +MODULE_PARM_DESC(ib_ipath_qp_table_size, "QP table size");<br />> +<br />> +unsigned int ib_ipath_lkey_table_size = 12;<br />> +module_param(ib_ipath_lkey_table_size, uint, 0444);<br />> +MODULE_PARM_DESC(ib_ipath_lkey_table_size,<br />> + "LKEY table size in bits (2^n, 1 <= n <= 23)");<br />> +<br />> +unsigned int ib_ipath_debug; /* debug mask */<br />> +module_param(ib_ipath_debug, uint, 0644);<br />> +MODULE_PARM_DESC(ib_ipath_debug, "Verbs debug mask");<br />> +<br />> +<br />> +static void ipath_ud_loopback(struct ipath_qp *sqp, struct ipath_sge_state *ss,<br />> + u32 len, struct ib_send_wr *wr, struct ib_wc *wc);<br />> +static void ipath_ruc_loopback(struct ipath_qp *sqp, struct ib_wc *wc);<br />> +static int ipath_destroy_qp(struct ib_qp *ibqp);<br />> +<br />> +MODULE_LICENSE("GPL");<br />> +MODULE_AUTHOR("PathScale <infinipath-support@pathscale.com>");<br />> +MODULE_DESCRIPTION("Pathscale InfiniPath driver");<br />> +<br />> +enum {<br />> + IPATH_FAULT_RC_DROP_SEND_F = 1,<br />> + IPATH_FAULT_RC_DROP_SEND_M,<br />> + IPATH_FAULT_RC_DROP_SEND_L,<br />> + IPATH_FAULT_RC_DROP_SEND_O,<br />> + IPATH_FAULT_RC_DROP_RDMA_WRITE_F,<br />> + IPATH_FAULT_RC_DROP_RDMA_WRITE_M,<br />> + IPATH_FAULT_RC_DROP_RDMA_WRITE_L,<br />> + IPATH_FAULT_RC_DROP_RDMA_WRITE_O,<br />> + IPATH_FAULT_RC_DROP_RDMA_READ_RESP_F,<br />> + IPATH_FAULT_RC_DROP_RDMA_READ_RESP_M,<br />> + IPATH_FAULT_RC_DROP_RDMA_READ_RESP_L,<br />> + IPATH_FAULT_RC_DROP_RDMA_READ_RESP_O,<br />> + IPATH_FAULT_RC_DROP_ACK,<br />> +};<br />> +<br />> +enum {<br />> + IPATH_TRANS_INVALID = 0,<br />> + IPATH_TRANS_ANY2RST,<br />> + IPATH_TRANS_RST2INIT,<br />> + IPATH_TRANS_INIT2INIT,<br />> + IPATH_TRANS_INIT2RTR,<br />> + IPATH_TRANS_RTR2RTS,<br />> + IPATH_TRANS_RTS2RTS,<br />> + IPATH_TRANS_SQERR2RTS,<br />> + IPATH_TRANS_ANY2ERR,<br />> + IPATH_TRANS_RTS2SQD, /* XXX Wait for expected ACKs & signal event */<br />> + IPATH_TRANS_SQD2SQD, /* error if not drained & parameter change */<br />> + IPATH_TRANS_SQD2RTS, /* error if not drained */<br />> +};<br />> +<br />> +enum {<br />> + IPATH_POST_SEND_OK = 0x0001,<br />> + IPATH_POST_RECV_OK = 0x0002,<br />> + IPATH_PROCESS_RECV_OK = 0x0004,<br />> + IPATH_PROCESS_SEND_OK = 0x0008,<br />> +};<br />> +<br />> +static int state_ops[IB_QPS_ERR + 1] = {<br />> + [IB_QPS_RESET] = 0,<br />> + [IB_QPS_INIT] = IPATH_POST_RECV_OK,<br />> + [IB_QPS_RTR] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK,<br />> + [IB_QPS_RTS] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK |<br />> + IPATH_POST_SEND_OK | IPATH_PROCESS_SEND_OK,<br />> + [IB_QPS_SQD] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK |<br />> + IPATH_POST_SEND_OK,<br />> + [IB_QPS_SQE] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK,<br />> + [IB_QPS_ERR] = 0,<br />> +};<br />> +<br />> +/*<br />> + * Convert the AETH credit code into the number of credits.<br />> + */<br />> +static u32 credit_table[31] = {<br />> + 0, /* 0 */<br />> + 1, /* 1 */<br />> + 2, /* 2 */<br />> + 3, /* 3 */<br />> + 4, /* 4 */<br />> + 6, /* 5 */<br />> + 8, /* 6 */<br />> + 12, /* 7 */<br />> + 16, /* 8 */<br />> + 24, /* 9 */<br />> + 32, /* A */<br />> + 48, /* B */<br />> + 64, /* C */<br />> + 96, /* D */<br />> + 128, /* E */<br />> + 192, /* F */<br />> + 256, /* 10 */<br />> + 384, /* 11 */<br />> + 512, /* 12 */<br />> + 768, /* 13 */<br />> + 1024, /* 14 */<br />> + 1536, /* 15 */<br />> + 2048, /* 16 */<br />> + 3072, /* 17 */<br />> + 4096, /* 18 */<br />> + 6144, /* 19 */<br />> + 8192, /* 1A */<br />> + 12288, /* 1B */<br />> + 16384, /* 1C */<br />> + 24576, /* 1D */<br />> + 32768 /* 1E */<br />> +};<br />> +<br />> +/*<br />> + * Convert the AETH RNR timeout code into the number of milliseconds.<br />> + */<br />> +static u32 rnr_table[32] = {<br />> + 656, /* 0 */<br />> + 1, /* 1 */<br />> + 1, /* 2 */<br />> + 1, /* 3 */<br />> + 1, /* 4 */<br />> + 1, /* 5 */<br />> + 1, /* 6 */<br />> + 1, /* 7 */<br />> + 1, /* 8 */<br />> + 1, /* 9 */<br />> + 1, /* A */<br />> + 1, /* B */<br />> + 1, /* C */<br />> + 1, /* D */<br />> + 2, /* E */<br />> + 2, /* F */<br />> + 3, /* 10 */<br />> + 4, /* 11 */<br />> + 6, /* 12 */<br />> + 8, /* 13 */<br />> + 11, /* 14 */<br />> + 16, /* 15 */<br />> + 21, /* 16 */<br />> + 31, /* 17 */<br />> + 41, /* 18 */<br />> + 62, /* 19 */<br />> + 82, /* 1A */<br />> + 123, /* 1B */<br />> + 164, /* 1C */<br />> + 246, /* 1D */<br />> + 328, /* 1E */<br />> + 492 /* 1F */<br />> +};<br />> +<br />> +/*<br />> + * Translate ib_wr_opcode into ib_wc_opcode.<br />> + */<br />> +static enum ib_wc_opcode wc_opcode[] = {<br />> + [IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,<br />> + [IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,<br />> + [IB_WR_SEND] = IB_WC_SEND,<br />> + [IB_WR_SEND_WITH_IMM] = IB_WC_SEND,<br />> + [IB_WR_RDMA_READ] = IB_WC_RDMA_READ,<br />> + [IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,<br />> + [IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD<br />> +};<br />> +<br />> +/*<br />> + * Array of device pointers.<br />> + */<br />> +static uint32_t number_of_devices;<br />> +static struct ipath_ibdev **ipath_devices;<br />> +<br />> +/*<br />> + * Global table of GID to attached QPs.<br />> + * The table is global to all ipath devices since a send from one QP/device<br />> + * needs to be locally routed to any locally attached QPs on the same<br />> + * or different device.<br />> + */<br />> +static struct rb_root mcast_tree;<br />> +static spinlock_t mcast_lock = SPIN_LOCK_UNLOCKED;<br />> +<br />> +/*<br />> + * Allocate a structure to link a QP to the multicast GID structure.<br />> + */<br />> +static struct ipath_mcast_qp *ipath_mcast_qp_alloc(struct ipath_qp *qp)<br />> +{<br />> + struct ipath_mcast_qp *mqp;<br />> +<br />> + mqp = kmalloc(sizeof(*mqp), GFP_KERNEL);<br />> + if (!mqp)<br />> + return NULL;<br />> +<br />> + mqp->qp = qp;<br />> + atomic_inc(&qp->refcount);<br />> +<br />> + return mqp;<br />> +}<br />> +<br />> +static void ipath_mcast_qp_free(struct ipath_mcast_qp *mqp)<br />> +{<br />> + struct ipath_qp *qp = mqp->qp;<br />> +<br />> + /* Notify ipath_destroy_qp() if it is waiting. */<br />> + if (atomic_dec_and_test(&qp->refcount))<br />> + wake_up(&qp->wait);<br />> +<br />> + kfree(mqp);<br />> +}<br />> +<br />> +/*<br />> + * Allocate a structure for the multicast GID.<br />> + * A list of QPs will be attached to this structure.<br />> + */<br />> +static struct ipath_mcast *ipath_mcast_alloc(union ib_gid *mgid)<br />> +{<br />> + struct ipath_mcast *mcast;<br />> +<br />> + mcast = kmalloc(sizeof(*mcast), GFP_KERNEL);<br />> + if (!mcast)<br />> + return NULL;<br />> +<br />> + mcast->mgid = *mgid;<br />> + INIT_LIST_HEAD(&mcast->qp_list);<br />> + init_waitqueue_head(&mcast->wait);<br />> + atomic_set(&mcast->refcount, 0);<br />> +<br />> + return mcast;<br />> +}<br />> +<br />> +static void ipath_mcast_free(struct ipath_mcast *mcast)<br />> +{<br />> + struct ipath_mcast_qp *p, *tmp;<br />> +<br />> + list_for_each_entry_safe(p, tmp, &mcast->qp_list, list)<br />> + ipath_mcast_qp_free(p);<br />> +<br />> + kfree(mcast);<br />> +}<br />> +<br />> +/*<br />> + * Search the global table for the given multicast GID.<br />> + * Return it or NULL if not found.<br />> + * The caller is responsible for decrementing the reference count if found.<br />> + */<br />> +static struct ipath_mcast *ipath_mcast_find(union ib_gid *mgid)<br />> +{<br />> + struct rb_node *n;<br />> + unsigned long flags;<br />> +<br />> + spin_lock_irqsave(&mcast_lock, flags);<br />> + n = mcast_tree.rb_node;<br />> + while (n) {<br />> + struct ipath_mcast *mcast;<br />> + int ret;<br />> +<br />> + mcast = rb_entry(n, struct ipath_mcast, rb_node);<br />> +<br />> + ret = memcmp(mgid->raw, mcast->mgid.raw, sizeof(union ib_gid));<br />> + if (ret < 0)<br />> + n = n->rb_left;<br />> + else if (ret > 0)<br />> + n = n->rb_right;<br />> + else {<br />> + atomic_inc(&mcast->refcount);<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> + return mcast;<br />> + }<br />> + }<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> +<br />> + return NULL;<br />> +}<br />> +<br />> +/*<br />> + * Insert the multicast GID into the table and<br />> + * attach the QP structure.<br />> + * Return zero if both were added.<br />> + * Return EEXIST if the GID was already in the table but the QP was added.<br />> + * Return ESRCH if the QP was already attached and neither structure was added.<br />> + */<br />> +static int ipath_mcast_add(struct ipath_mcast *mcast,<br />> + struct ipath_mcast_qp *mqp)<br />> +{<br />> + struct rb_node **n = &mcast_tree.rb_node;<br />> + struct rb_node *pn = NULL;<br />> + unsigned long flags;<br />> +<br />> + spin_lock_irqsave(&mcast_lock, flags);<br />> +<br />> + while (*n) {<br />> + struct ipath_mcast *tmcast;<br />> + struct ipath_mcast_qp *p;<br />> + int ret;<br />> +<br />> + pn = *n;<br />> + tmcast = rb_entry(pn, struct ipath_mcast, rb_node);<br />> +<br />> + ret = memcmp(mcast->mgid.raw, tmcast->mgid.raw,<br />> + sizeof(union ib_gid));<br />> + if (ret < 0) {<br />> + n = &pn->rb_left;<br />> + continue;<br />> + }<br />> + if (ret > 0) {<br />> + n = &pn->rb_right;<br />> + continue;<br />> + }<br />> +<br />> + /* Search the QP list to see if this is already there. */<br />> + list_for_each_entry_rcu(p, &tmcast->qp_list, list) {<br /><br />Given that we hold the global mcast_lock, how is RCU helping here?<br /><br />Is there a lock-free read-side traversal path somewhere that I am<br />missing?<br /><br />> + if (p->qp == mqp->qp) {<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> + return ESRCH;<br />> + }<br />> + }<br />> + list_add_tail_rcu(&mqp->list, &tmcast->qp_list);<br /><br />Ditto...<br /><br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> + return EEXIST;<br />> + }<br />> +<br />> + list_add_tail_rcu(&mqp->list, &mcast->qp_list);<br /><br />Ditto...<br /><br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> +<br />> + atomic_inc(&mcast->refcount);<br />> + rb_link_node(&mcast->rb_node, pn, n);<br />> + rb_insert_color(&mcast->rb_node, &mcast_tree);<br />> +<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> +<br />> + return 0;<br />> +}<br />> +<br />> +static int ipath_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid,<br />> + u16 lid)<br />> +{<br />> + struct ipath_qp *qp = to_iqp(ibqp);<br />> + struct ipath_mcast *mcast;<br />> + struct ipath_mcast_qp *mqp;<br />> +<br />> + /*<br />> + * Allocate data structures since its better to do this outside of<br />> + * spin locks and it will most likely be needed.<br />> + */<br />> + mcast = ipath_mcast_alloc(gid);<br />> + if (mcast == NULL)<br />> + return -ENOMEM;<br />> + mqp = ipath_mcast_qp_alloc(qp);<br />> + if (mqp == NULL) {<br />> + ipath_mcast_free(mcast);<br />> + return -ENOMEM;<br />> + }<br />> + switch (ipath_mcast_add(mcast, mqp)) {<br />> + case ESRCH:<br />> + /* Neither was used: can't attach the same QP twice. */<br />> + ipath_mcast_qp_free(mqp);<br />> + ipath_mcast_free(mcast);<br />> + return -EINVAL;<br />> + case EEXIST: /* The mcast wasn't used */<br />> + ipath_mcast_free(mcast);<br />> + break;<br />> + default:<br />> + break;<br />> + }<br />> + return 0;<br />> +}<br />> +<br />> +static int ipath_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid,<br />> + u16 lid)<br />> +{<br />> + struct ipath_qp *qp = to_iqp(ibqp);<br />> + struct ipath_mcast *mcast = NULL;<br />> + struct ipath_mcast_qp *p, *tmp;<br />> + struct rb_node *n;<br />> + unsigned long flags;<br />> + int last = 0;<br />> +<br />> + spin_lock_irqsave(&mcast_lock, flags);<br />> +<br />> + /* Find the GID in the mcast table. */<br />> + n = mcast_tree.rb_node;<br />> + while (1) {<br />> + int ret;<br />> +<br />> + if (n == NULL) {<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> + return 0;<br />> + }<br />> +<br />> + mcast = rb_entry(n, struct ipath_mcast, rb_node);<br />> + ret = memcmp(gid->raw, mcast->mgid.raw, sizeof(union ib_gid));<br />> + if (ret < 0)<br />> + n = n->rb_left;<br />> + else if (ret > 0)<br />> + n = n->rb_right;<br />> + else<br />> + break;<br />> + }<br />> +<br />> + /* Search the QP list. */<br />> + list_for_each_entry_safe(p, tmp, &mcast->qp_list, list) {<br />> + if (p->qp != qp)<br />> + continue;<br />> + /*<br />> + * We found it, so remove it, but don't poison the forward link<br />> + * until we are sure there are no list walkers.<br />> + */<br />> + list_del_rcu(&p->list);<br /><br />Ditto...<br /><br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> +<br />> + /* If this was the last attached QP, remove the GID too. */<br />> + if (list_empty(&mcast->qp_list)) {<br />> + rb_erase(&mcast->rb_node, &mcast_tree);<br />> + last = 1;<br />> + }<br />> + break;<br />> + }<br />> +<br />> + spin_unlock_irqrestore(&mcast_lock, flags);<br />> +<br />> + if (p) {<br />> + /*<br />> + * Wait for any list walkers to finish before freeing the<br />> + * list element.<br />> + */<br />> + wait_event(mcast->wait, atomic_read(&mcast->refcount) <= 1);<br />> + ipath_mcast_qp_free(p);<br />> + }<br />> + if (last) {<br />> + atomic_dec(&mcast->refcount);<br />> + wait_event(mcast->wait, !atomic_read(&mcast->refcount));<br />> + ipath_mcast_free(mcast);<br />> + }<br />> +<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * Copy data to SGE memory.<br />> + */<br />> +static void copy_sge(struct ipath_sge_state *ss, void *data, u32 length)<br />> +{<br />> + struct ipath_sge *sge = &ss->sge;<br />> +<br />> + while (length) {<br />> + u32 len = sge->length;<br />> +<br />> + BUG_ON(len == 0);<br />> + if (len > length)<br />> + len = length;<br />> + memcpy(sge->vaddr, data, len);<br />> + sge->vaddr += len;<br />> + sge->length -= len;<br />> + sge->sge_length -= len;<br />> + if (sge->sge_length == 0) {<br />> + if (--ss->num_sge)<br />> + *sge = *ss->sg_list++;<br />> + } else if (sge->length == 0 && sge->mr != NULL) {<br />> + if (++sge->n >= IPATH_SEGSZ) {<br />> + if (++sge->m >= sge->mr->mapsz)<br />> + break;<br />> + sge->n = 0;<br />> + }<br />> + sge->vaddr = sge->mr->map[sge->m]->segs[sge->n].vaddr;<br />> + sge->length = sge->mr->map[sge->m]->segs[sge->n].length;<br />> + }<br />> + data += len;<br />> + length -= len;<br />> + }<br />> +}<br />> +<br />> +/*<br />> + * Skip over length bytes of SGE memory.<br />> + */<br />> +static void skip_sge(struct ipath_sge_state *ss, u32 length)<br />> +{<br />> + struct ipath_sge *sge = &ss->sge;<br />> +<br />> + while (length > sge->sge_length) {<br />> + length -= sge->sge_length;<br />> + ss->sge = *ss->sg_list++;<br />> + }<br />> + while (length) {<br />> + u32 len = sge->length;<br />> +<br />> + BUG_ON(len == 0);<br />> + if (len > length)<br />> + len = length;<br />> + sge->vaddr += len;<br />> + sge->length -= len;<br />> + sge->sge_length -= len;<br />> + if (sge->sge_length == 0) {<br />> + if (--ss->num_sge)<br />> + *sge = *ss->sg_list++;<br />> + } else if (sge->length == 0 && sge->mr != NULL) {<br />> + if (++sge->n >= IPATH_SEGSZ) {<br />> + if (++sge->m >= sge->mr->mapsz)<br />> + break;<br />> + sge->n = 0;<br />> + }<br />> + sge->vaddr = sge->mr->map[sge->m]->segs[sge->n].vaddr;<br />> + sge->length = sge->mr->map[sge->m]->segs[sge->n].length;<br />> + }<br />> + length -= len;<br />> + }<br />> +}<br />> +<br />> +static inline u32 alloc_qpn(struct ipath_qp_table *qpt)<br />> +{<br />> + u32 i, offset, max_scan, qpn;<br />> + struct qpn_map *map;<br />> +<br />> + qpn = qpt->last + 1;<br />> + if (qpn >= QPN_MAX)<br />> + qpn = 2;<br />> + offset = qpn & BITS_PER_PAGE_MASK;<br />> + map = &qpt->map[qpn / BITS_PER_PAGE];<br />> + max_scan = qpt->nmaps - !offset;<br />> + for (i = 0;;) {<br />> + if (unlikely(!map->page)) {<br />> + unsigned long page = get_zeroed_page(GFP_KERNEL);<br />> + unsigned long flags;<br />> +<br />> + /*<br />> + * Free the page if someone raced with us<br />> + * installing it:<br />> + */<br />> + spin_lock_irqsave(&qpt->lock, flags);<br />> + if (map->page)<br />> + free_page(page);<br />> + else<br />> + map->page = (void *)page;<br />> + spin_unlock_irqrestore(&qpt->lock, flags);<br />> + if (unlikely(!map->page))<br />> + break;<br />> + }<br />> + if (likely(atomic_read(&map->n_free))) {<br />> + do {<br />> + if (!test_and_set_bit(offset, map->page)) {<br />> + atomic_dec(&map->n_free);<br />> + qpt->last = qpn;<br />> + return qpn;<br />> + }<br />> + offset = find_next_offset(map, offset);<br />> + qpn = mk_qpn(qpt, map, offset);<br />> + /*<br />> + * This test differs from alloc_pidmap().<br />> + * If find_next_offset() does find a zero bit,<br />> + * we don't need to check for QPN wrapping<br />> + * around past our starting QPN. We<br />> + * just need to be sure we don't loop forever.<br />> + */<br />> + } while (offset < BITS_PER_PAGE && qpn < QPN_MAX);<br />> + }<br />> + /*<br />> + * In order to keep the number of pages allocated to a minimum,<br />> + * we scan the all existing pages before increasing the size<br />> + * of the bitmap table.<br />> + */<br />> + if (++i > max_scan) {<br />> + if (qpt->nmaps == QPNMAP_ENTRIES)<br />> + break;<br />> + map = &qpt->map[qpt->nmaps++];<br />> + offset = 0;<br />> + } else if (map < &qpt->map[qpt->nmaps]) {<br />> + ++map;<br />> + offset = 0;<br />> + } else {<br />> + map = &qpt->map[0];<br />> + offset = 2;<br />> + }<br />> + qpn = mk_qpn(qpt, map, offset);<br />> + }<br />> + return 0;<br />> +}<br />> +<br />> +static inline void free_qpn(struct ipath_qp_table *qpt, u32 qpn)<br />> +{<br />> + struct qpn_map *map;<br />> +<br />> + map = qpt->map + qpn / BITS_PER_PAGE;<br />> + if (map->page)<br />> + clear_bit(qpn & BITS_PER_PAGE_MASK, map->page);<br />> + atomic_inc(&map->n_free);<br />> +}<br />> +<br />> +/*<br />> + * Allocate the next available QPN and put the QP into the hash table.<br />> + * The hash table holds a reference to the QP.<br />> + */<br />> +static int ipath_alloc_qpn(struct ipath_qp_table *qpt, struct ipath_qp *qp,<br />> + enum ib_qp_type type)<br />> +{<br />> + unsigned long flags;<br />> + u32 qpn;<br />> +<br />> + if (type == IB_QPT_SMI)<br />> + qpn = 0;<br />> + else if (type == IB_QPT_GSI)<br />> + qpn = 1;<br />> + else {<br />> + /* Allocate the next available QPN */<br />> + qpn = alloc_qpn(qpt);<br />> + if (qpn == 0) {<br />> + return -ENOMEM;<br />> + }<br />> + }<br />> + qp->ibqp.qp_num = qpn;<br />> +<br />> + /* Add the QP to the hash table. */<br />> + spin_lock_irqsave(&qpt->lock, flags);<br />> +<br />> + qpn %= qpt->max;<br />> + qp->next = qpt->table[qpn];<br />> + qpt->table[qpn] = qp;<br />> + atomic_inc(&qp->refcount);<br />> +<br />> + spin_unlock_irqrestore(&qpt->lock, flags);<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * Remove the QP from the table so it can't be found asynchronously by<br />> + * the receive interrupt routine.<br />> + */<br />> +static void ipath_free_qp(struct ipath_qp_table *qpt, struct ipath_qp *qp)<br />> +{<br />> + struct ipath_qp *q, **qpp;<br />> + unsigned long flags;<br />> + int fnd = 0;<br />> +<br />> + spin_lock_irqsave(&qpt->lock, flags);<br />> +<br />> + /* Remove QP from the hash table. */<br />> + qpp = &qpt->table[qp->ibqp.qp_num % qpt->max];<br />> + for (; (q = *qpp) != NULL; qpp = &q->next) {<br />> + if (q == qp) {<br />> + *qpp = qp->next;<br />> + qp->next = NULL;<br />> + atomic_dec(&qp->refcount);<br />> + fnd = 1;<br />> + break;<br />> + }<br />> + }<br />> +<br />> + spin_unlock_irqrestore(&qpt->lock, flags);<br />> +<br />> + if (!fnd)<br />> + return;<br />> +<br />> + /* If QPN is not reserved, mark QPN free in the bitmap. */<br />> + if (qp->ibqp.qp_num > 1)<br />> + free_qpn(qpt, qp->ibqp.qp_num);<br />> +<br />> + wait_event(qp->wait, !atomic_read(&qp->refcount));<br />> +}<br />> +<br />> +/*<br />> + * Remove all QPs from the table.<br />> + */<br />> +static void ipath_free_all_qps(struct ipath_qp_table *qpt)<br />> +{<br />> + unsigned long flags;<br />> + struct ipath_qp *qp, *nqp;<br />> + u32 n;<br />> +<br />> + for (n = 0; n < qpt->max; n++) {<br />> + spin_lock_irqsave(&qpt->lock, flags);<br />> + qp = qpt->table[n];<br />> + qpt->table[n] = NULL;<br />> + spin_unlock_irqrestore(&qpt->lock, flags);<br />> +<br />> + while (qp) {<br />> + nqp = qp->next;<br />> + if (qp->ibqp.qp_num > 1)<br />> + free_qpn(qpt, qp->ibqp.qp_num);<br />> + if (!atomic_dec_and_test(&qp->refcount) ||<br />> + !ipath_destroy_qp(&qp->ibqp))<br />> + _VERBS_INFO("QP memory leak!\n");<br />> + qp = nqp;<br />> + }<br />> + }<br />> +<br />> + for (n = 0; n < ARRAY_SIZE(qpt->map); n++) {<br />> + if (qpt->map[n].page)<br />> + free_page((unsigned long)qpt->map[n].page);<br />> + }<br />> +}<br />> +<br />> +/*<br />> + * Return the QP with the given QPN.<br />> + * The caller is responsible for decrementing the QP reference count when done.<br />> + */<br />> +static struct ipath_qp *ipath_lookup_qpn(struct ipath_qp_table *qpt, u32 qpn)<br />> +{<br />> + unsigned long flags;<br />> + struct ipath_qp *qp;<br />> +<br />> + spin_lock_irqsave(&qpt->lock, flags);<br />> +<br />> + for (qp = qpt->table[qpn % qpt->max]; qp; qp = qp->next) {<br />> + if (qp->ibqp.qp_num == qpn) {<br />> + atomic_inc(&qp->refcount);<br />> + break;<br />> + }<br />> + }<br />> +<br />> + spin_unlock_irqrestore(&qpt->lock, flags);<br />> + return qp;<br />> +}<br />> +<br />> +static int ipath_alloc_lkey(struct ipath_lkey_table *rkt,<br />> + struct ipath_mregion *mr)<br />> +{<br />> + unsigned long flags;<br />> + u32 r;<br />> + u32 n;<br />> +<br />> + spin_lock_irqsave(&rkt->lock, flags);<br />> +<br />> + /* Find the next available LKEY */<br />> + r = n = rkt->next;<br />> + for (;;) {<br />> + if (rkt->table[r] == NULL)<br />> + break;<br />> + r = (r + 1) & (rkt->max - 1);<br />> + if (r == n) {<br />> + spin_unlock_irqrestore(&rkt->lock, flags);<br />> + _VERBS_INFO("LKEY table full\n");<br />> + return 0;<br />> + }<br />> + }<br />> + rkt->next = (r + 1) & (rkt->max - 1);<br />> + /*<br />> + * Make sure lkey is never zero which is reserved to indicate an<br />> + * unrestricted LKEY.<br />> + */<br />> + rkt->gen++;<br />> + mr->lkey = (r << (32 - ib_ipath_lkey_table_size)) |<br />> + ((((1 << (24 - ib_ipath_lkey_table_size)) - 1) & rkt->gen) << 8);<br />> + if (mr->lkey == 0) {<br />> + mr->lkey |= 1 << 8;<br />> + rkt->gen++;<br />> + }<br />> + rkt->table[r] = mr;<br />> + spin_unlock_irqrestore(&rkt->lock, flags);<br />> +<br />> + return 1;<br />> +}<br />> +<br />> +static void ipath_free_lkey(struct ipath_lkey_table *rkt, u32 lkey)<br />> +{<br />> + unsigned long flags;<br />> + u32 r;<br />> +<br />> + if (lkey == 0)<br />> + return;<br />> + r = lkey >> (32 - ib_ipath_lkey_table_size);<br />> + spin_lock_irqsave(&rkt->lock, flags);<br />> + rkt->table[r] = NULL;<br />> + spin_unlock_irqrestore(&rkt->lock, flags);<br />> +}<br />> +<br />> +/*<br />> + * Check the IB SGE for validity and initialize our internal version of it.<br />> + * Return 1 if OK, else zero.<br />> + */<br />> +static int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge,<br />> + struct ib_sge *sge, int acc)<br />> +{<br />> + struct ipath_mregion *mr;<br />> + size_t off;<br />> +<br />> + /*<br />> + * We use LKEY == zero to mean a physical kmalloc() address.<br />> + * This is a bit of a hack since we rely on dma_map_single()<br />> + * being reversible by calling bus_to_virt().<br />> + */<br />> + if (sge->lkey == 0) {<br />> + isge->mr = NULL;<br />> + isge->vaddr = bus_to_virt(sge->addr);<br />> + isge->length = sge->length;<br />> + isge->sge_length = sge->length;<br />> + return 1;<br />> + }<br />> + spin_lock(&rkt->lock);<br />> + mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))];<br />> + spin_unlock(&rkt->lock);<br />> + if (unlikely(mr == NULL || mr->lkey != sge->lkey))<br />> + return 0;<br />> +<br />> + off = sge->addr - mr->user_base;<br />> + if (unlikely(sge->addr < mr->user_base ||<br />> + off + sge->length > mr->length ||<br />> + (mr->access_flags & acc) != acc))<br />> + return 0;<br />> +<br />> + off += mr->offset;<br />> + isge->mr = mr;<br />> + isge->m = 0;<br />> + isge->n = 0;<br />> + while (off >= mr->map[isge->m]->segs[isge->n].length) {<br />> + off -= mr->map[isge->m]->segs[isge->n].length;<br />> + if (++isge->n >= IPATH_SEGSZ) {<br />> + isge->m++;<br />> + isge->n = 0;<br />> + }<br />> + }<br />> + isge->vaddr = mr->map[isge->m]->segs[isge->n].vaddr + off;<br />> + isge->length = mr->map[isge->m]->segs[isge->n].length - off;<br />> + isge->sge_length = sge->length;<br />> + return 1;<br />> +}<br />> +<br />> +/*<br />> + * Initialize the qp->s_sge after a restart.<br />> + * The QP s_lock should be held.<br />> + */<br />> +static void ipath_init_restart(struct ipath_qp *qp, struct ipath_swqe *wqe)<br />> +{<br />> + struct ipath_ibdev *dev;<br />> + u32 len;<br />> +<br />> + len = ((qp->s_psn - wqe->psn) & 0xFFFFFF) *<br />> + ib_mtu_enum_to_int(qp->path_mtu);<br />> + qp->s_sge.sge = wqe->sg_list[0];<br />> + qp->s_sge.sg_list = wqe->sg_list + 1;<br />> + qp->s_sge.num_sge = wqe->wr.num_sge;<br />> + skip_sge(&qp->s_sge, len);<br />> + qp->s_len = wqe->length - len;<br />> + dev = to_idev(qp->ibqp.device);<br />> + spin_lock(&dev->pending_lock);<br />> + if (qp->timerwait.next == LIST_POISON1)<br />> + list_add_tail(&qp->timerwait,<br />> + &dev->pending[dev->pending_index]);<br />> + spin_unlock(&dev->pending_lock);<br />> +}<br />> +<br />> +/*<br />> + * Check the IB virtual address, length, and RKEY.<br />> + * Return 1 if OK, else zero.<br />> + * The QP r_rq.lock should be held.<br />> + */<br />> +static int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,<br />> + u32 len, u64 vaddr, u32 rkey, int acc)<br />> +{<br />> + struct ipath_lkey_table *rkt = &dev->lk_table;<br />> + struct ipath_sge *sge = &ss->sge;<br />> + struct ipath_mregion *mr;<br />> + size_t off;<br />> +<br />> + spin_lock(&rkt->lock);<br />> + mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))];<br />> + spin_unlock(&rkt->lock);<br />> + if (unlikely(mr == NULL || mr->lkey != rkey))<br />> + return 0;<br />> +<br />> + off = vaddr - mr->iova;<br />> + if (unlikely(vaddr < mr->iova || off + len > mr->length ||<br />> + (mr->access_flags & acc) == 0))<br />> + return 0;<br />> +<br />> + off += mr->offset;<br />> + sge->mr = mr;<br />> + sge->m = 0;<br />> + sge->n = 0;<br />> + while (off >= mr->map[sge->m]->segs[sge->n].length) {<br />> + off -= mr->map[sge->m]->segs[sge->n].length;<br />> + if (++sge->n >= IPATH_SEGSZ) {<br />> + sge->m++;<br />> + sge->n = 0;<br />> + }<br />> + }<br />> + sge->vaddr = mr->map[sge->m]->segs[sge->n].vaddr + off;<br />> + sge->length = mr->map[sge->m]->segs[sge->n].length - off;<br />> + sge->sge_length = len;<br />> + ss->sg_list = NULL;<br />> + ss->num_sge = 1;<br />> + return 1;<br />> +}<br />> +<br />> +/*<br />> + * Add a new entry to the completion queue.<br />> + * This may be called with one of the qp->s_lock or qp->r_rq.lock held.<br />> + */<br />> +static void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig)<br />> +{<br />> + unsigned long flags;<br />> + u32 next;<br />> +<br />> + spin_lock_irqsave(&cq->lock, flags);<br />> +<br />> + cq->queue[cq->head] = *entry;<br />> + next = cq->head + 1;<br />> + if (next == cq->ibcq.cqe)<br />> + next = 0;<br />> + if (next != cq->tail)<br />> + cq->head = next;<br />> + else {<br />> + /* XXX - need to mark current wr as having an error... */<br />> + }<br />> +<br />> + if (cq->notify == IB_CQ_NEXT_COMP ||<br />> + (cq->notify == IB_CQ_SOLICITED && sig)) {<br />> + cq->notify = IB_CQ_NONE;<br />> + cq->triggered++;<br />> + /*<br />> + * This will cause send_complete() to be called in<br />> + * another thread.<br />> + */<br />> + tasklet_schedule(&cq->comptask);<br />> + }<br />> +<br />> + spin_unlock_irqrestore(&cq->lock, flags);<br />> +<br />> + if (entry->status != IB_WC_SUCCESS)<br />> + to_idev(cq->ibcq.device)->n_wqe_errs++;<br />> +}<br />> +<br />> +static void send_complete(unsigned long data)<br />> +{<br />> + struct ipath_cq *cq = (struct ipath_cq *)data;<br />> +<br />> + /*<br />> + * The completion handler will most likely rearm the notification<br />> + * and poll for all pending entries. If a new completion entry<br />> + * is added while we are in this routine, tasklet_schedule()<br />> + * won't call us again until we return so we check triggered to<br />> + * see if we need to call the handler again.<br />> + */<br />> + for (;;) {<br />> + u8 triggered = cq->triggered;<br />> +<br />> + cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);<br />> +<br />> + if (cq->triggered == triggered)<br />> + return;<br />> + }<br />> +}<br />> +<br />> +/*<br />> + * This is the QP state transition table.<br />> + * See ipath_modify_qp() for details.<br />> + */<br />> +static const struct {<br />> + int trans;<br />> + u32 req_param[IB_QPT_RAW_IPV6];<br />> + u32 opt_param[IB_QPT_RAW_IPV6];<br />> +} qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {<br />> + [IB_QPS_RESET] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_INIT] = {<br />> + .trans = IPATH_TRANS_RST2INIT,<br />> + .req_param = {<br />> + [IB_QPT_SMI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_ACCESS_FLAGS),<br />> + [IB_QPT_RC] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_ACCESS_FLAGS),<br />> + },<br />> + },<br />> + },<br />> + [IB_QPS_INIT] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_INIT] = {<br />> + .trans = IPATH_TRANS_INIT2INIT,<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_ACCESS_FLAGS),<br />> + [IB_QPT_RC] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_PORT |<br />> + IB_QP_ACCESS_FLAGS),<br />> + }<br />> + },<br />> + [IB_QPS_RTR] = {<br />> + .trans = IPATH_TRANS_INIT2RTR,<br />> + .req_param = {<br />> + [IB_QPT_UC] = (IB_QP_AV |<br />> + IB_QP_PATH_MTU |<br />> + IB_QP_DEST_QPN |<br />> + IB_QP_RQ_PSN),<br />> + [IB_QPT_RC] = (IB_QP_AV |<br />> + IB_QP_PATH_MTU |<br />> + IB_QP_DEST_QPN |<br />> + IB_QP_RQ_PSN |<br />> + IB_QP_MAX_DEST_RD_ATOMIC |<br />> + IB_QP_MIN_RNR_TIMER),<br />> + },<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_PKEY_INDEX |<br />> + IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX),<br />> + [IB_QPT_RC] = (IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX),<br />> + }<br />> + }<br />> + },<br />> + [IB_QPS_RTR] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_RTS] = {<br />> + .trans = IPATH_TRANS_RTR2RTS,<br />> + .req_param = {<br />> + [IB_QPT_SMI] = IB_QP_SQ_PSN,<br />> + [IB_QPT_GSI] = IB_QP_SQ_PSN,<br />> + [IB_QPT_UD] = IB_QP_SQ_PSN,<br />> + [IB_QPT_UC] = IB_QP_SQ_PSN,<br />> + [IB_QPT_RC] = (IB_QP_TIMEOUT |<br />> + IB_QP_RETRY_CNT |<br />> + IB_QP_RNR_RETRY |<br />> + IB_QP_SQ_PSN |<br />> + IB_QP_MAX_QP_RD_ATOMIC),<br />> + },<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX |<br />> + IB_QP_PATH_MIG_STATE),<br />> + [IB_QPT_RC] = (IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX |<br />> + IB_QP_MIN_RNR_TIMER |<br />> + IB_QP_PATH_MIG_STATE),<br />> + }<br />> + }<br />> + },<br />> + [IB_QPS_RTS] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_RTS] = {<br />> + .trans = IPATH_TRANS_RTS2RTS,<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_ACCESS_FLAGS |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_PATH_MIG_STATE),<br />> + [IB_QPT_RC] = (IB_QP_ACCESS_FLAGS |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_PATH_MIG_STATE |<br />> + IB_QP_MIN_RNR_TIMER),<br />> + }<br />> + },<br />> + [IB_QPS_SQD] = {<br />> + .trans = IPATH_TRANS_RTS2SQD,<br />> + },<br />> + },<br />> + [IB_QPS_SQD] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_RTS] = {<br />> + .trans = IPATH_TRANS_SQD2RTS,<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PATH_MIG_STATE),<br />> + [IB_QPT_RC] = (IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_MIN_RNR_TIMER |<br />> + IB_QP_PATH_MIG_STATE),<br />> + }<br />> + },<br />> + [IB_QPS_SQD] = {<br />> + .trans = IPATH_TRANS_SQD2SQD,<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_PKEY_INDEX | IB_QP_QKEY),<br />> + [IB_QPT_UC] = (IB_QP_AV |<br />> + IB_QP_TIMEOUT |<br />> + IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX |<br />> + IB_QP_PATH_MIG_STATE),<br />> + [IB_QPT_RC] = (IB_QP_AV |<br />> + IB_QP_TIMEOUT |<br />> + IB_QP_RETRY_CNT |<br />> + IB_QP_RNR_RETRY |<br />> + IB_QP_MAX_QP_RD_ATOMIC |<br />> + IB_QP_MAX_DEST_RD_ATOMIC |<br />> + IB_QP_CUR_STATE |<br />> + IB_QP_ALT_PATH |<br />> + IB_QP_ACCESS_FLAGS |<br />> + IB_QP_PKEY_INDEX |<br />> + IB_QP_MIN_RNR_TIMER |<br />> + IB_QP_PATH_MIG_STATE),<br />> + }<br />> + }<br />> + },<br />> + [IB_QPS_SQE] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR },<br />> + [IB_QPS_RTS] = {<br />> + .trans = IPATH_TRANS_SQERR2RTS,<br />> + .opt_param = {<br />> + [IB_QPT_SMI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_GSI] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UD] = (IB_QP_CUR_STATE | IB_QP_QKEY),<br />> + [IB_QPT_UC] = IB_QP_CUR_STATE,<br />> + [IB_QPT_RC] = (IB_QP_CUR_STATE |<br />> + IB_QP_MIN_RNR_TIMER),<br />> + }<br />> + }<br />> + },<br />> + [IB_QPS_ERR] = {<br />> + [IB_QPS_RESET] = { .trans = IPATH_TRANS_ANY2RST },<br />> + [IB_QPS_ERR] = { .trans = IPATH_TRANS_ANY2ERR }<br />> + }<br />> +};<br />> +<br />> +/*<br />> + * Initialize the QP state to the reset state.<br />> + */<br />> +static void ipath_reset_qp(struct ipath_qp *qp)<br />> +{<br />> + qp->remote_qpn = 0;<br />> + qp->qkey = 0;<br />> + qp->qp_access_flags = 0;<br />> + qp->s_hdrwords = 0;<br />> + qp->s_psn = 0;<br />> + qp->r_psn = 0;<br />> + atomic_set(&qp->msn, 0);<br />> + if (qp->ibqp.qp_type == IB_QPT_RC) {<br />> + qp->s_state = IB_OPCODE_RC_SEND_LAST;<br />> + qp->r_state = IB_OPCODE_RC_SEND_LAST;<br />> + } else {<br />> + qp->s_state = IB_OPCODE_UC_SEND_LAST;<br />> + qp->r_state = IB_OPCODE_UC_SEND_LAST;<br />> + }<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> + qp->s_nak_state = 0;<br />> + qp->s_rnr_timeout = 0;<br />> + qp->s_head = 0;<br />> + qp->s_tail = 0;<br />> + qp->s_cur = 0;<br />> + qp->s_last = 0;<br />> + qp->s_ssn = 1;<br />> + qp->s_lsn = 0;<br />> + qp->r_rq.head = 0;<br />> + qp->r_rq.tail = 0;<br />> + qp->r_reuse_sge = 0;<br />> +}<br />> +<br />> +/*<br />> + * Flush send work queue.<br />> + * The QP s_lock should be held.<br />> + */<br />> +static void ipath_sqerror_qp(struct ipath_qp *qp, struct ib_wc *wc)<br />> +{<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);<br />> +<br />> + _VERBS_INFO("Send queue error on QP%d/%d: err: %d\n",<br />> + qp->ibqp.qp_num, qp->remote_qpn, wc->status);<br />> +<br />> + spin_lock(&dev->pending_lock);<br />> + /* XXX What if its already removed by the timeout code? */<br />> + if (qp->timerwait.next != LIST_POISON1)<br />> + list_del(&qp->timerwait);<br />> + if (qp->piowait.next != LIST_POISON1)<br />> + list_del(&qp->piowait);<br />> + spin_unlock(&dev->pending_lock);<br />> +<br />> + ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1);<br />> + if (++qp->s_last >= qp->s_size)<br />> + qp->s_last = 0;<br />> +<br />> + wc->status = IB_WC_WR_FLUSH_ERR;<br />> +<br />> + while (qp->s_last != qp->s_head) {<br />> + wc->wr_id = wqe->wr.wr_id;<br />> + wc->opcode = wc_opcode[wqe->wr.opcode];<br />> + ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1);<br />> + if (++qp->s_last >= qp->s_size)<br />> + qp->s_last = 0;<br />> + wqe = get_swqe_ptr(qp, qp->s_last);<br />> + }<br />> + qp->s_cur = qp->s_tail = qp->s_head;<br />> + qp->state = IB_QPS_SQE;<br />> +}<br />> +<br />> +/*<br />> + * Flush both send and receive work queues.<br />> + * QP r_rq.lock and s_lock should be held.<br />> + */<br />> +static void ipath_error_qp(struct ipath_qp *qp)<br />> +{<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + struct ib_wc wc;<br />> +<br />> + _VERBS_INFO("QP%d/%d in error state\n",<br />> + qp->ibqp.qp_num, qp->remote_qpn);<br />> +<br />> + spin_lock(&dev->pending_lock);<br />> + /* XXX What if its already removed by the timeout code? */<br />> + if (qp->timerwait.next != LIST_POISON1)<br />> + list_del(&qp->timerwait);<br />> + if (qp->piowait.next != LIST_POISON1)<br />> + list_del(&qp->piowait);<br />> + spin_unlock(&dev->pending_lock);<br />> +<br />> + wc.status = IB_WC_WR_FLUSH_ERR;<br />> + wc.vendor_err = 0;<br />> + wc.byte_len = 0;<br />> + wc.imm_data = 0;<br />> + wc.qp_num = qp->ibqp.qp_num;<br />> + wc.src_qp = 0;<br />> + wc.wc_flags = 0;<br />> + wc.pkey_index = 0;<br />> + wc.slid = 0;<br />> + wc.sl = 0;<br />> + wc.dlid_path_bits = 0;<br />> + wc.port_num = 0;<br />> +<br />> + while (qp->s_last != qp->s_head) {<br />> + struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);<br />> +<br />> + wc.wr_id = wqe->wr.wr_id;<br />> + wc.opcode = wc_opcode[wqe->wr.opcode];<br />> + if (++qp->s_last >= qp->s_size)<br />> + qp->s_last = 0;<br />> + ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 1);<br />> + }<br />> + qp->s_cur = qp->s_tail = qp->s_head;<br />> + qp->s_hdrwords = 0;<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> +<br />> + wc.opcode = IB_WC_RECV;<br />> + while (qp->r_rq.tail != qp->r_rq.head) {<br />> + wc.wr_id = get_rwqe_ptr(&qp->r_rq, qp->r_rq.tail)->wr_id;<br />> + if (++qp->r_rq.tail >= qp->r_rq.size)<br />> + qp->r_rq.tail = 0;<br />> + ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);<br />> + }<br />> +}<br />> +<br />> +static int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,<br />> + int attr_mask)<br />> +{<br />> + struct ipath_qp *qp = to_iqp(ibqp);<br />> + enum ib_qp_state cur_state, new_state;<br />> + u32 req_param, opt_param;<br />> + unsigned long flags;<br />> +<br />> + if (attr_mask & IB_QP_CUR_STATE) {<br />> + cur_state = attr->cur_qp_state;<br />> + if (cur_state != IB_QPS_RTR &&<br />> + cur_state != IB_QPS_RTS &&<br />> + cur_state != IB_QPS_SQD && cur_state != IB_QPS_SQE)<br />> + return -EINVAL;<br />> + spin_lock_irqsave(&qp->r_rq.lock, flags);<br />> + spin_lock(&qp->s_lock);<br />> + } else {<br />> + spin_lock_irqsave(&qp->r_rq.lock, flags);<br />> + spin_lock(&qp->s_lock);<br />> + cur_state = qp->state;<br />> + }<br />> +<br />> + if (attr_mask & IB_QP_STATE) {<br />> + new_state = attr->qp_state;<br />> + if (new_state < 0 || new_state > IB_QPS_ERR)<br />> + goto inval;<br />> + } else<br />> + new_state = cur_state;<br />> +<br />> + switch (qp_state_table[cur_state][new_state].trans) {<br />> + case IPATH_TRANS_INVALID:<br />> + goto inval;<br />> +<br />> + case IPATH_TRANS_ANY2RST:<br />> + ipath_reset_qp(qp);<br />> + break;<br />> +<br />> + case IPATH_TRANS_ANY2ERR:<br />> + ipath_error_qp(qp);<br />> + break;<br />> +<br />> + }<br />> +<br />> + req_param =<br />> + qp_state_table[cur_state][new_state].req_param[qp->ibqp.qp_type];<br />> + opt_param =<br />> + qp_state_table[cur_state][new_state].opt_param[qp->ibqp.qp_type];<br />> +<br />> + if ((req_param & attr_mask) != req_param)<br />> + goto inval;<br />> +<br />> + if (attr_mask & ~(req_param | opt_param | IB_QP_STATE))<br />> + goto inval;<br />> +<br />> + if (attr_mask & IB_QP_PKEY_INDEX) {<br />> + struct ipath_ibdev *dev = to_idev(ibqp->device);<br />> +<br />> + if (attr->pkey_index >= ipath_layer_get_npkeys(dev->ib_unit))<br />> + goto inval;<br />> + qp->s_pkey_index = attr->pkey_index;<br />> + }<br />> +<br />> + if (attr_mask & IB_QP_DEST_QPN)<br />> + qp->remote_qpn = attr->dest_qp_num;<br />> +<br />> + if (attr_mask & IB_QP_SQ_PSN) {<br />> + qp->s_next_psn = attr->sq_psn;<br />> + qp->s_last_psn = qp->s_next_psn - 1;<br />> + }<br />> +<br />> + if (attr_mask & IB_QP_RQ_PSN)<br />> + qp->r_psn = attr->rq_psn;<br />> +<br />> + if (attr_mask & IB_QP_ACCESS_FLAGS)<br />> + qp->qp_access_flags = attr->qp_access_flags;<br />> +<br />> + if (attr_mask & IB_QP_AV)<br />> + qp->remote_ah_attr = attr->ah_attr;<br />> +<br />> + if (attr_mask & IB_QP_PATH_MTU)<br />> + qp->path_mtu = attr->path_mtu;<br />> +<br />> + if (attr_mask & IB_QP_RETRY_CNT)<br />> + qp->s_retry = qp->s_retry_cnt = attr->retry_cnt;<br />> +<br />> + if (attr_mask & IB_QP_RNR_RETRY) {<br />> + qp->s_rnr_retry = attr->rnr_retry;<br />> + if (qp->s_rnr_retry > 7)<br />> + qp->s_rnr_retry = 7;<br />> + qp->s_rnr_retry_cnt = qp->s_rnr_retry;<br />> + }<br />> +<br />> + if (attr_mask & IB_QP_MIN_RNR_TIMER)<br />> + qp->s_min_rnr_timer = attr->min_rnr_timer & 0x1F;<br />> +<br />> + if (attr_mask & IB_QP_QKEY)<br />> + qp->qkey = attr->qkey;<br />> +<br />> + if (attr_mask & IB_QP_PKEY_INDEX)<br />> + qp->s_pkey_index = attr->pkey_index;<br />> +<br />> + qp->state = new_state;<br />> + spin_unlock(&qp->s_lock);<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> +<br />> + /*<br />> + * Try to move to ARMED if QP1 changed to the RTS state.<br />> + */<br />> + if (qp->ibqp.qp_num == 1 && new_state == IB_QPS_RTS) {<br />> + struct ipath_ibdev *dev = to_idev(ibqp->device);<br />> +<br />> + /*<br />> + * Bounce the link even if it was active so the SM will<br />> + * reinitialize the SMA's state.<br />> + */<br />> + ipath_kset_linkstate((dev->ib_unit << 16) | IPATH_IB_LINKDOWN);<br />> + ipath_kset_linkstate((dev->ib_unit << 16) | IPATH_IB_LINKARM);<br />> + }<br />> + return 0;<br />> +<br />> +inval:<br />> + spin_unlock(&qp->s_lock);<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> + return -EINVAL;<br />> +}<br />> +<br />> +/*<br />> + * Compute the AETH (syndrome + MSN).<br />> + * The QP s_lock should be held.<br />> + */<br />> +static u32 ipath_compute_aeth(struct ipath_qp *qp)<br />> +{<br />> + u32 aeth = atomic_read(&qp->msn) & 0xFFFFFF;<br />> +<br />> + if (qp->s_nak_state) {<br />> + aeth |= qp->s_nak_state << 24;<br />> + } else if (qp->ibqp.srq) {<br />> + /* Shared receive queues don't generate credits. */<br />> + aeth |= 0x1F << 24;<br />> + } else {<br />> + u32 min, max, x;<br />> + u32 credits;<br />> +<br />> + /*<br />> + * Compute the number of credits available (RWQEs).<br />> + * XXX Not holding the r_rq.lock here so there is a small<br />> + * chance that the pair of reads are not atomic.<br />> + */<br />> + credits = qp->r_rq.head - qp->r_rq.tail;<br />> + if ((int)credits < 0)<br />> + credits += qp->r_rq.size;<br />> + /* Binary search the credit table to find the code to use. */<br />> + min = 0;<br />> + max = 31;<br />> + for (;;) {<br />> + x = (min + max) / 2;<br />> + if (credit_table[x] == credits)<br />> + break;<br />> + if (credit_table[x] > credits)<br />> + max = x;<br />> + else if (min == x)<br />> + break;<br />> + else<br />> + min = x;<br />> + }<br />> + aeth |= x << 24;<br />> + }<br />> + return cpu_to_be32(aeth);<br />> +}<br />> +<br />> +<br />> +static void no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev)<br />> +{<br />> + unsigned long flags;<br />> +<br />> + spin_lock_irqsave(&dev->pending_lock, flags);<br />> + if (qp->piowait.next == LIST_POISON1)<br />> + list_add_tail(&qp->piowait, &dev->piowait);<br />> + spin_unlock_irqrestore(&dev->pending_lock, flags);<br />> + /*<br />> + * Note that as soon as ipath_layer_want_buffer() is called and<br />> + * possibly before it returns, ipath_ib_piobufavail()<br />> + * could be called. If we are still in the tasklet function,<br />> + * tasklet_schedule() will not call us until the next time<br />> + * tasklet_schedule() is called.<br />> + * We clear the tasklet flag now since we are committing to return<br />> + * from the tasklet function.<br />> + */<br />> + tasklet_unlock(&qp->s_task);<br />> + ipath_layer_want_buffer(dev->ib_unit);<br />> + dev->n_piowait++;<br />> +}<br />> +<br />> +/*<br />> + * Process entries in the send work queue until the queue is exhausted.<br />> + * Only allow one CPU to send a packet per QP (tasklet).<br />> + * Otherwise, after we drop the QP lock, two threads could send<br />> + * packets out of order.<br />> + * This is similar to do_rc_send() below except we don't have timeouts or<br />> + * resends.<br />> + */<br />> +static void do_uc_send(unsigned long data)<br />> +{<br />> + struct ipath_qp *qp = (struct ipath_qp *)data;<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + struct ipath_swqe *wqe;<br />> + unsigned long flags;<br />> + u16 lrh0;<br />> + u32 hwords;<br />> + u32 nwords;<br />> + u32 extra_bytes;<br />> + u32 bth0;<br />> + u32 bth2;<br />> + u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);<br />> + u32 len;<br />> + struct ipath_other_headers *ohdr;<br />> + struct ib_wc wc;<br />> +<br />> + if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))<br />> + return;<br />> +<br />> + if (unlikely(qp->remote_ah_attr.dlid ==<br />> + ipath_layer_get_lid(dev->ib_unit))) {<br />> + /* Pass in an uninitialized ib_wc to save stack space. */<br />> + ipath_ruc_loopback(qp, &wc);<br />> + clear_bit(IPATH_S_BUSY, &qp->s_flags);<br />> + return;<br />> + }<br />> +<br />> + ohdr = &qp->s_hdr.u.oth;<br />> + if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)<br />> + ohdr = &qp->s_hdr.u.l.oth;<br />> +<br />> +again:<br />> + /* Check for a constructed packet to be sent. */<br />> + if (qp->s_hdrwords != 0) {<br />> + /*<br />> + * If no PIO bufs are available, return.<br />> + * An interrupt will call ipath_ib_piobufavail()<br />> + * when one is available.<br />> + */<br />> + if (ipath_verbs_send(dev->ib_unit, qp->s_hdrwords,<br />> + (uint32_t *) &qp->s_hdr,<br />> + qp->s_cur_size, qp->s_cur_sge)) {<br />> + no_bufs_available(qp, dev);<br />> + return;<br />> + }<br />> + /* Record that we sent the packet and s_hdr is empty. */<br />> + qp->s_hdrwords = 0;<br />> + }<br />> +<br />> + lrh0 = IPS_LRH_BTH;<br />> + /* header size in 32-bit words LRH+BTH = (8+12)/4. */<br />> + hwords = 5;<br />> +<br />> + /*<br />> + * The lock is needed to synchronize between<br />> + * setting qp->s_ack_state and post_send().<br />> + */<br />> + spin_lock_irqsave(&qp->s_lock, flags);<br />> +<br />> + if (!(state_ops[qp->state] & IPATH_PROCESS_SEND_OK))<br />> + goto done;<br />> +<br />> + bth0 = ipath_layer_get_pkey(dev->ib_unit, qp->s_pkey_index);<br />> +<br />> + /* Send a request. */<br />> + wqe = get_swqe_ptr(qp, qp->s_last);<br />> + switch (qp->s_state) {<br />> + default:<br />> + /* Signal the completion of the last send (if there is one). */<br />> + if (qp->s_last != qp->s_tail) {<br />> + if (++qp->s_last == qp->s_size)<br />> + qp->s_last = 0;<br />> + if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &qp->s_flags) ||<br />> + (wqe->wr.send_flags & IB_SEND_SIGNALED)) {<br />> + wc.wr_id = wqe->wr.wr_id;<br />> + wc.status = IB_WC_SUCCESS;<br />> + wc.opcode = wc_opcode[wqe->wr.opcode];<br />> + wc.vendor_err = 0;<br />> + wc.byte_len = wqe->length;<br />> + wc.qp_num = qp->ibqp.qp_num;<br />> + wc.src_qp = qp->remote_qpn;<br />> + wc.pkey_index = 0;<br />> + wc.slid = qp->remote_ah_attr.dlid;<br />> + wc.sl = qp->remote_ah_attr.sl;<br />> + wc.dlid_path_bits = 0;<br />> + wc.port_num = 0;<br />> + ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc,<br />> + 0);<br />> + }<br />> + wqe = get_swqe_ptr(qp, qp->s_last);<br />> + }<br />> + /* Check if send work queue is empty. */<br />> + if (qp->s_tail == qp->s_head)<br />> + goto done;<br />> + /*<br />> + * Start a new request.<br />> + */<br />> + qp->s_psn = wqe->psn = qp->s_next_psn;<br />> + qp->s_sge.sge = wqe->sg_list[0];<br />> + qp->s_sge.sg_list = wqe->sg_list + 1;<br />> + qp->s_sge.num_sge = wqe->wr.num_sge;<br />> + qp->s_len = len = wqe->length;<br />> + switch (wqe->wr.opcode) {<br />> + case IB_WR_SEND:<br />> + case IB_WR_SEND_WITH_IMM:<br />> + if (len > pmtu) {<br />> + qp->s_state = IB_OPCODE_UC_SEND_FIRST;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_SEND) {<br />> + qp->s_state = IB_OPCODE_UC_SEND_ONLY;<br />> + } else {<br />> + qp->s_state =<br />> + IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + }<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + break;<br />> +<br />> + case IB_WR_RDMA_WRITE:<br />> + case IB_WR_RDMA_WRITE_WITH_IMM:<br />> + ohdr->u.rc.reth.vaddr =<br />> + cpu_to_be64(wqe->wr.wr.rdma.remote_addr);<br />> + ohdr->u.rc.reth.rkey =<br />> + cpu_to_be32(wqe->wr.wr.rdma.rkey);<br />> + ohdr->u.rc.reth.length = cpu_to_be32(len);<br />> + hwords += sizeof(struct ib_reth) / 4;<br />> + if (len > pmtu) {<br />> + qp->s_state = IB_OPCODE_UC_RDMA_WRITE_FIRST;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_RDMA_WRITE) {<br />> + qp->s_state = IB_OPCODE_UC_RDMA_WRITE_ONLY;<br />> + } else {<br />> + qp->s_state =<br />> + IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the RETH */<br />> + ohdr->u.rc.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + }<br />> + break;<br />> +<br />> + default:<br />> + goto done;<br />> + }<br />> + if (++qp->s_tail >= qp->s_size)<br />> + qp->s_tail = 0;<br />> + break;<br />> +<br />> + case IB_OPCODE_UC_SEND_FIRST:<br />> + qp->s_state = IB_OPCODE_UC_SEND_MIDDLE;<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_UC_SEND_MIDDLE:<br />> + len = qp->s_len;<br />> + if (len > pmtu) {<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_SEND)<br />> + qp->s_state = IB_OPCODE_UC_SEND_LAST;<br />> + else {<br />> + qp->s_state = IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + }<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + break;<br />> +<br />> + case IB_OPCODE_UC_RDMA_WRITE_FIRST:<br />> + qp->s_state = IB_OPCODE_UC_RDMA_WRITE_MIDDLE;<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_UC_RDMA_WRITE_MIDDLE:<br />> + len = qp->s_len;<br />> + if (len > pmtu) {<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_RDMA_WRITE)<br />> + qp->s_state = IB_OPCODE_UC_RDMA_WRITE_LAST;<br />> + else {<br />> + qp->s_state =<br />> + IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + }<br />> + break;<br />> + }<br />> + bth2 = qp->s_next_psn++ & 0xFFFFFF;<br />> + qp->s_len -= len;<br />> + bth0 |= qp->s_state << 24;<br />> +<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> +<br />> + /* Construct the header. */<br />> + extra_bytes = (4 - len) & 3;<br />> + nwords = (len + extra_bytes) >> 2;<br />> + if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {<br />> + /* Header size in 32-bit words. */<br />> + hwords += 10;<br />> + lrh0 = IPS_LRH_GRH;<br />> + qp->s_hdr.u.l.grh.version_tclass_flow =<br />> + cpu_to_be32((6 << 28) |<br />> + (qp->remote_ah_attr.grh.traffic_class << 20) |<br />> + qp->remote_ah_attr.grh.flow_label);<br />> + qp->s_hdr.u.l.grh.paylen =<br />> + cpu_to_be16(((hwords - 12) + nwords + SIZE_OF_CRC) << 2);<br />> + qp->s_hdr.u.l.grh.next_hdr = 0x1B;<br />> + qp->s_hdr.u.l.grh.hop_limit = qp->remote_ah_attr.grh.hop_limit;<br />> + /* The SGID is 32-bit aligned. */<br />> + qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix;<br />> + qp->s_hdr.u.l.grh.sgid.global.interface_id =<br />> + ipath_layer_get_guid(dev->ib_unit);<br />> + qp->s_hdr.u.l.grh.dgid = qp->remote_ah_attr.grh.dgid;<br />> + }<br />> + qp->s_hdrwords = hwords;<br />> + qp->s_cur_sge = &qp->s_sge;<br />> + qp->s_cur_size = len;<br />> + lrh0 |= qp->remote_ah_attr.sl << 4;<br />> + qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);<br />> + /* DEST LID */<br />> + qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);<br />> + qp->s_hdr.lrh[2] = cpu_to_be16(hwords + nwords + SIZE_OF_CRC);<br />> + qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->ib_unit));<br />> + bth0 |= extra_bytes << 20;<br />> + ohdr->bth[0] = cpu_to_be32(bth0);<br />> + ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);<br />> + ohdr->bth[2] = cpu_to_be32(bth2);<br />> +<br />> + /* Check for more work to do. */<br />> + goto again;<br />> +<br />> +done:<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> + clear_bit(IPATH_S_BUSY, &qp->s_flags);<br />> +}<br />> +<br />> +/*<br />> + * Process entries in the send work queue until credit or queue is exhausted.<br />> + * Only allow one CPU to send a packet per QP (tasklet).<br />> + * Otherwise, after we drop the QP s_lock, two threads could send<br />> + * packets out of order.<br />> + */<br />> +static void do_rc_send(unsigned long data)<br />> +{<br />> + struct ipath_qp *qp = (struct ipath_qp *)data;<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + struct ipath_swqe *wqe;<br />> + struct ipath_sge_state *ss;<br />> + unsigned long flags;<br />> + u16 lrh0;<br />> + u32 hwords;<br />> + u32 nwords;<br />> + u32 extra_bytes;<br />> + u32 bth0;<br />> + u32 bth2;<br />> + u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);<br />> + u32 len;<br />> + struct ipath_other_headers *ohdr;<br />> + char newreq;<br />> +<br />> + if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))<br />> + return;<br />> +<br />> + if (unlikely(qp->remote_ah_attr.dlid ==<br />> + ipath_layer_get_lid(dev->ib_unit))) {<br />> + struct ib_wc wc;<br />> +<br />> + /*<br />> + * Pass in an uninitialized ib_wc to be consistent with<br />> + * other places where ipath_ruc_loopback() is called.<br />> + */<br />> + ipath_ruc_loopback(qp, &wc);<br />> + clear_bit(IPATH_S_BUSY, &qp->s_flags);<br />> + return;<br />> + }<br />> +<br />> + ohdr = &qp->s_hdr.u.oth;<br />> + if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)<br />> + ohdr = &qp->s_hdr.u.l.oth;<br />> +<br />> +again:<br />> + /* Check for a constructed packet to be sent. */<br />> + if (qp->s_hdrwords != 0) {<br />> + /*<br />> + * If no PIO bufs are available, return.<br />> + * An interrupt will call ipath_ib_piobufavail()<br />> + * when one is available.<br />> + */<br />> + if (ipath_verbs_send(dev->ib_unit, qp->s_hdrwords,<br />> + (uint32_t *) &qp->s_hdr,<br />> + qp->s_cur_size, qp->s_cur_sge)) {<br />> + no_bufs_available(qp, dev);<br />> + return;<br />> + }<br />> + /* Record that we sent the packet and s_hdr is empty. */<br />> + qp->s_hdrwords = 0;<br />> + }<br />> +<br />> + lrh0 = IPS_LRH_BTH;<br />> + /* header size in 32-bit words LRH+BTH = (8+12)/4. */<br />> + hwords = 5;<br />> +<br />> + /*<br />> + * The lock is needed to synchronize between<br />> + * setting qp->s_ack_state, resend timer, and post_send().<br />> + */<br />> + spin_lock_irqsave(&qp->s_lock, flags);<br />> +<br />> + bth0 = ipath_layer_get_pkey(dev->ib_unit, qp->s_pkey_index);<br />> +<br />> + /* Sending responses has higher priority over sending requests. */<br />> + if (qp->s_ack_state != IB_OPCODE_RC_ACKNOWLEDGE) {<br />> + /*<br />> + * Send a response.<br />> + * Note that we are in the responder's side of the QP context.<br />> + */<br />> + switch (qp->s_ack_state) {<br />> + case IB_OPCODE_RC_RDMA_READ_REQUEST:<br />> + ss = &qp->s_rdma_sge;<br />> + len = qp->s_rdma_len;<br />> + if (len > pmtu) {<br />> + len = pmtu;<br />> + qp->s_ack_state =<br />> + IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;<br />> + } else {<br />> + qp->s_ack_state =<br />> + IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;<br />> + }<br />> + qp->s_rdma_len -= len;<br />> + bth0 |= qp->s_ack_state << 24;<br />> + ohdr->u.aeth = ipath_compute_aeth(qp);<br />> + hwords++;<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:<br />> + qp->s_ack_state =<br />> + IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:<br />> + ss = &qp->s_rdma_sge;<br />> + len = qp->s_rdma_len;<br />> + if (len > pmtu) {<br />> + len = pmtu;<br />> + } else {<br />> + ohdr->u.aeth = ipath_compute_aeth(qp);<br />> + hwords++;<br />> + qp->s_ack_state =<br />> + IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST;<br />> + }<br />> + qp->s_rdma_len -= len;<br />> + bth0 |= qp->s_ack_state << 24;<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST:<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY:<br />> + /*<br />> + * We have to prevent new requests from changing<br />> + * the r_sge state while a ipath_verbs_send()<br />> + * is in progress.<br />> + * Changing r_state allows the receiver<br />> + * to continue processing new packets.<br />> + * We do it here now instead of above so<br />> + * that we are sure the packet was sent before<br />> + * changing the state.<br />> + */<br />> + qp->r_state = IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST;<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> + goto send_req;<br />> +<br />> + case IB_OPCODE_RC_COMPARE_SWAP:<br />> + case IB_OPCODE_RC_FETCH_ADD:<br />> + ss = NULL;<br />> + len = 0;<br />> + qp->r_state = IB_OPCODE_RC_SEND_LAST;<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> + bth0 |= IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24;<br />> + ohdr->u.at.aeth = ipath_compute_aeth(qp);<br />> + ohdr->u.at.atomic_ack_eth =<br />> + cpu_to_be64(qp->s_ack_atomic);<br />> + hwords += sizeof(ohdr->u.at) / 4;<br />> + break;<br />> +<br />> + default:<br />> + /* Send a regular ACK. */<br />> + ss = NULL;<br />> + len = 0;<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> + bth0 |= qp->s_ack_state << 24;<br />> + ohdr->u.aeth = ipath_compute_aeth(qp);<br />> + hwords++;<br />> + }<br />> + bth2 = qp->s_ack_psn++ & 0xFFFFFF;<br />> + } else {<br />> + send_req:<br />> + if (!(state_ops[qp->state] & IPATH_PROCESS_SEND_OK) ||<br />> + qp->s_rnr_timeout)<br />> + goto done;<br />> +<br />> + /* Send a request. */<br />> + wqe = get_swqe_ptr(qp, qp->s_cur);<br />> + switch (qp->s_state) {<br />> + default:<br />> + /*<br />> + * Resend an old request or start a new one.<br />> + *<br />> + * We keep track of the current SWQE so that<br />> + * we don't reset the "furthest progress" state<br />> + * if we need to back up.<br />> + */<br />> + newreq = 0;<br />> + if (qp->s_cur == qp->s_tail) {<br />> + /* Check if send work queue is empty. */<br />> + if (qp->s_tail == qp->s_head)<br />> + goto done;<br />> + qp->s_psn = wqe->psn = qp->s_next_psn;<br />> + newreq = 1;<br />> + }<br />> + /*<br />> + * Note that we have to be careful not to modify the<br />> + * original work request since we may need to resend<br />> + * it.<br />> + */<br />> + qp->s_sge.sge = wqe->sg_list[0];<br />> + qp->s_sge.sg_list = wqe->sg_list + 1;<br />> + qp->s_sge.num_sge = wqe->wr.num_sge;<br />> + qp->s_len = len = wqe->length;<br />> + ss = &qp->s_sge;<br />> + bth2 = 0;<br />> + switch (wqe->wr.opcode) {<br />> + case IB_WR_SEND:<br />> + case IB_WR_SEND_WITH_IMM:<br />> + /* If no credit, return. */<br />> + if (qp->s_lsn != (u32) -1 &&<br />> + cmp24(wqe->ssn, qp->s_lsn + 1) > 0) {<br />> + goto done;<br />> + }<br />> + wqe->lpsn = wqe->psn;<br />> + if (len > pmtu) {<br />> + wqe->lpsn += (len - 1) / pmtu;<br />> + qp->s_state = IB_OPCODE_RC_SEND_FIRST;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_SEND) {<br />> + qp->s_state = IB_OPCODE_RC_SEND_ONLY;<br />> + } else {<br />> + qp->s_state =<br />> + IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + }<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + bth2 = 1 << 31; /* Request ACK. */<br />> + if (++qp->s_cur == qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_WR_RDMA_WRITE:<br />> + if (newreq)<br />> + qp->s_lsn++;<br />> + /* FALLTHROUGH */<br />> + case IB_WR_RDMA_WRITE_WITH_IMM:<br />> + /* If no credit, return. */<br />> + if (qp->s_lsn != (u32) -1 &&<br />> + cmp24(wqe->ssn, qp->s_lsn + 1) > 0) {<br />> + goto done;<br />> + }<br />> + ohdr->u.rc.reth.vaddr =<br />> + cpu_to_be64(wqe->wr.wr.rdma.remote_addr);<br />> + ohdr->u.rc.reth.rkey =<br />> + cpu_to_be32(wqe->wr.wr.rdma.rkey);<br />> + ohdr->u.rc.reth.length = cpu_to_be32(len);<br />> + hwords += sizeof(struct ib_reth) / 4;<br />> + wqe->lpsn = wqe->psn;<br />> + if (len > pmtu) {<br />> + wqe->lpsn += (len - 1) / pmtu;<br />> + qp->s_state =<br />> + IB_OPCODE_RC_RDMA_WRITE_FIRST;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_RDMA_WRITE) {<br />> + qp->s_state =<br />> + IB_OPCODE_RC_RDMA_WRITE_ONLY;<br />> + } else {<br />> + qp->s_state =<br />> + IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE;<br />> + /* Immediate data comes after RETH */<br />> + ohdr->u.rc.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + if (wqe->wr.<br />> + send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + }<br />> + bth2 = 1 << 31; /* Request ACK. */<br />> + if (++qp->s_cur == qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_WR_RDMA_READ:<br />> + ohdr->u.rc.reth.vaddr =<br />> + cpu_to_be64(wqe->wr.wr.rdma.remote_addr);<br />> + ohdr->u.rc.reth.rkey =<br />> + cpu_to_be32(wqe->wr.wr.rdma.rkey);<br />> + ohdr->u.rc.reth.length = cpu_to_be32(len);<br />> + qp->s_state = IB_OPCODE_RC_RDMA_READ_REQUEST;<br />> + hwords += sizeof(ohdr->u.rc.reth) / 4;<br />> + if (newreq) {<br />> + qp->s_lsn++;<br />> + /*<br />> + * Adjust s_next_psn to count the<br />> + * expected number of responses.<br />> + */<br />> + if (len > pmtu)<br />> + qp->s_next_psn +=<br />> + (len - 1) / pmtu;<br />> + wqe->lpsn = qp->s_next_psn++;<br />> + }<br />> + ss = NULL;<br />> + len = 0;<br />> + if (++qp->s_cur == qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_WR_ATOMIC_CMP_AND_SWP:<br />> + case IB_WR_ATOMIC_FETCH_AND_ADD:<br />> + qp->s_state =<br />> + wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ?<br />> + IB_OPCODE_RC_COMPARE_SWAP :<br />> + IB_OPCODE_RC_FETCH_ADD;<br />> + ohdr->u.atomic_eth.vaddr =<br />> + cpu_to_be64(wqe->wr.wr.atomic.remote_addr);<br />> + ohdr->u.atomic_eth.rkey =<br />> + cpu_to_be32(wqe->wr.wr.atomic.rkey);<br />> + ohdr->u.atomic_eth.swap_data =<br />> + cpu_to_be64(wqe->wr.wr.atomic.swap);<br />> + ohdr->u.atomic_eth.compare_data =<br />> + cpu_to_be64(wqe->wr.wr.atomic.compare_add);<br />> + hwords += sizeof(struct ib_atomic_eth) / 4;<br />> + if (newreq) {<br />> + qp->s_lsn++;<br />> + wqe->lpsn = wqe->psn;<br />> + }<br />> + if (++qp->s_cur == qp->s_size)<br />> + qp->s_cur = 0;<br />> + ss = NULL;<br />> + len = 0;<br />> + break;<br />> +<br />> + default:<br />> + goto done;<br />> + }<br />> + if (newreq) {<br />> + if (++qp->s_tail >= qp->s_size)<br />> + qp->s_tail = 0;<br />> + }<br />> + bth2 |= qp->s_psn++ & 0xFFFFFF;<br />> + if ((int)(qp->s_psn - qp->s_next_psn) > 0)<br />> + qp->s_next_psn = qp->s_psn;<br />> + spin_lock(&dev->pending_lock);<br />> + if (qp->timerwait.next == LIST_POISON1) {<br />> + list_add_tail(&qp->timerwait,<br />> + &dev->pending[dev-><br />> + pending_index]);<br />> + }<br />> + spin_unlock(&dev->pending_lock);<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:<br />> + /*<br />> + * This case can only happen if a send is<br />> + * restarted. See ipath_restart_rc().<br />> + */<br />> + ipath_init_restart(qp, wqe);<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_RC_SEND_FIRST:<br />> + qp->s_state = IB_OPCODE_RC_SEND_MIDDLE;<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_RC_SEND_MIDDLE:<br />> + bth2 = qp->s_psn++ & 0xFFFFFF;<br />> + if ((int)(qp->s_psn - qp->s_next_psn) > 0)<br />> + qp->s_next_psn = qp->s_psn;<br />> + ss = &qp->s_sge;<br />> + len = qp->s_len;<br />> + if (len > pmtu) {<br />> + /*<br />> + * Request an ACK every 1/2 MB to avoid<br />> + * retransmit timeouts.<br />> + */<br />> + if (((wqe->length - len) % (512 * 1024)) == 0)<br />> + bth2 |= 1 << 31;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_SEND)<br />> + qp->s_state = IB_OPCODE_RC_SEND_LAST;<br />> + else {<br />> + qp->s_state =<br />> + IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + }<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + bth2 |= 1 << 31; /* Request ACK. */<br />> + if (++qp->s_cur >= qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST:<br />> + /*<br />> + * This case can only happen if a RDMA write is<br />> + * restarted. See ipath_restart_rc().<br />> + */<br />> + ipath_init_restart(qp, wqe);<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_RC_RDMA_WRITE_FIRST:<br />> + qp->s_state = IB_OPCODE_RC_RDMA_WRITE_MIDDLE;<br />> + /* FALLTHROUGH */<br />> + case IB_OPCODE_RC_RDMA_WRITE_MIDDLE:<br />> + bth2 = qp->s_psn++ & 0xFFFFFF;<br />> + if ((int)(qp->s_psn - qp->s_next_psn) > 0)<br />> + qp->s_next_psn = qp->s_psn;<br />> + ss = &qp->s_sge;<br />> + len = qp->s_len;<br />> + if (len > pmtu) {<br />> + /*<br />> + * Request an ACK every 1/2 MB to avoid<br />> + * retransmit timeouts.<br />> + */<br />> + if (((wqe->length - len) % (512 * 1024)) == 0)<br />> + bth2 |= 1 << 31;<br />> + len = pmtu;<br />> + break;<br />> + }<br />> + if (wqe->wr.opcode == IB_WR_RDMA_WRITE)<br />> + qp->s_state = IB_OPCODE_RC_RDMA_WRITE_LAST;<br />> + else {<br />> + qp->s_state =<br />> + IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE;<br />> + /* Immediate data comes after the BTH */<br />> + ohdr->u.imm_data = wqe->wr.imm_data;<br />> + hwords += 1;<br />> + if (wqe->wr.send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + }<br />> + bth2 |= 1 << 31; /* Request ACK. */<br />> + if (++qp->s_cur >= qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:<br />> + /*<br />> + * This case can only happen if a RDMA read is<br />> + * restarted. See ipath_restart_rc().<br />> + */<br />> + ipath_init_restart(qp, wqe);<br />> + len = ((qp->s_psn - wqe->psn) & 0xFFFFFF) * pmtu;<br />> + ohdr->u.rc.reth.vaddr =<br />> + cpu_to_be64(wqe->wr.wr.rdma.remote_addr + len);<br />> + ohdr->u.rc.reth.rkey =<br />> + cpu_to_be32(wqe->wr.wr.rdma.rkey);<br />> + ohdr->u.rc.reth.length = cpu_to_be32(qp->s_len);<br />> + qp->s_state = IB_OPCODE_RC_RDMA_READ_REQUEST;<br />> + hwords += sizeof(ohdr->u.rc.reth) / 4;<br />> + bth2 = qp->s_psn++ & 0xFFFFFF;<br />> + if ((int)(qp->s_psn - qp->s_next_psn) > 0)<br />> + qp->s_next_psn = qp->s_psn;<br />> + ss = NULL;<br />> + len = 0;<br />> + if (++qp->s_cur == qp->s_size)<br />> + qp->s_cur = 0;<br />> + break;<br />> +<br />> + case IB_OPCODE_RC_RDMA_READ_REQUEST:<br />> + case IB_OPCODE_RC_COMPARE_SWAP:<br />> + case IB_OPCODE_RC_FETCH_ADD:<br />> + /*<br />> + * We shouldn't start anything new until this request<br />> + * is finished. The ACK will handle rescheduling us.<br />> + * XXX The number of outstanding ones is negotiated<br />> + * at connection setup time (see pg. 258,289)?<br />> + * XXX Also, if we support multiple outstanding<br />> + * requests, we need to check the WQE IB_SEND_FENCE<br />> + * flag and not send a new request if a RDMA read or<br />> + * atomic is pending.<br />> + */<br />> + goto done;<br />> + }<br />> + qp->s_len -= len;<br />> + bth0 |= qp->s_state << 24;<br />> + /* XXX queue resend timeout. */<br />> + }<br />> + /* Make sure it is non-zero before dropping the lock. */<br />> + qp->s_hdrwords = hwords;<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> +<br />> + /* Construct the header. */<br />> + extra_bytes = (4 - len) & 3;<br />> + nwords = (len + extra_bytes) >> 2;<br />> + if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {<br />> + /* Header size in 32-bit words. */<br />> + hwords += 10;<br />> + lrh0 = IPS_LRH_GRH;<br />> + qp->s_hdr.u.l.grh.version_tclass_flow =<br />> + cpu_to_be32((6 << 28) |<br />> + (qp->remote_ah_attr.grh.traffic_class << 20) |<br />> + qp->remote_ah_attr.grh.flow_label);<br />> + qp->s_hdr.u.l.grh.paylen =<br />> + cpu_to_be16(((hwords - 12) + nwords + SIZE_OF_CRC) << 2);<br />> + qp->s_hdr.u.l.grh.next_hdr = 0x1B;<br />> + qp->s_hdr.u.l.grh.hop_limit = qp->remote_ah_attr.grh.hop_limit;<br />> + /* The SGID is 32-bit aligned. */<br />> + qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix;<br />> + qp->s_hdr.u.l.grh.sgid.global.interface_id =<br />> + ipath_layer_get_guid(dev->ib_unit);<br />> + qp->s_hdr.u.l.grh.dgid = qp->remote_ah_attr.grh.dgid;<br />> + qp->s_hdrwords = hwords;<br />> + }<br />> + qp->s_cur_sge = ss;<br />> + qp->s_cur_size = len;<br />> + lrh0 |= qp->remote_ah_attr.sl << 4;<br />> + qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);<br />> + /* DEST LID */<br />> + qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);<br />> + qp->s_hdr.lrh[2] = cpu_to_be16(hwords + nwords + SIZE_OF_CRC);<br />> + qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->ib_unit));<br />> + bth0 |= extra_bytes << 20;<br />> + ohdr->bth[0] = cpu_to_be32(bth0);<br />> + ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);<br />> + ohdr->bth[2] = cpu_to_be32(bth2);<br />> +<br />> + /* Check for more work to do. */<br />> + goto again;<br />> +<br />> +done:<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> + clear_bit(IPATH_S_BUSY, &qp->s_flags);<br />> +}<br />> +<br />> +static void send_rc_ack(struct ipath_qp *qp)<br />> +{<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + u16 lrh0;<br />> + u32 bth0;<br />> + u32 hwords;<br />> + struct ipath_other_headers *ohdr;<br />> +<br />> + /* Construct the header. */<br />> + ohdr = &qp->s_hdr.u.oth;<br />> + lrh0 = IPS_LRH_BTH;<br />> + /* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4. */<br />> + hwords = 6;<br />> + if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {<br />> + ohdr = &qp->s_hdr.u.l.oth;<br />> + /* Header size in 32-bit words. */<br />> + hwords += 10;<br />> + lrh0 = IPS_LRH_GRH;<br />> + qp->s_hdr.u.l.grh.version_tclass_flow =<br />> + cpu_to_be32((6 << 28) |<br />> + (qp->remote_ah_attr.grh.traffic_class << 20) |<br />> + qp->remote_ah_attr.grh.flow_label);<br />> + qp->s_hdr.u.l.grh.paylen =<br />> + cpu_to_be16(((hwords - 12) + SIZE_OF_CRC) << 2);<br />> + qp->s_hdr.u.l.grh.next_hdr = 0x1B;<br />> + qp->s_hdr.u.l.grh.hop_limit = qp->remote_ah_attr.grh.hop_limit;<br />> + /* The SGID is 32-bit aligned. */<br />> + qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix;<br />> + qp->s_hdr.u.l.grh.sgid.global.interface_id =<br />> + ipath_layer_get_guid(dev->ib_unit);<br />> + qp->s_hdr.u.l.grh.dgid = qp->remote_ah_attr.grh.dgid;<br />> + }<br />> + bth0 = ipath_layer_get_pkey(dev->ib_unit, qp->s_pkey_index);<br />> + ohdr->u.aeth = ipath_compute_aeth(qp);<br />> + if (qp->s_ack_state >= IB_OPCODE_RC_COMPARE_SWAP) {<br />> + bth0 |= IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24;<br />> + ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic);<br />> + hwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4;<br />> + } else {<br />> + bth0 |= IB_OPCODE_RC_ACKNOWLEDGE << 24;<br />> + }<br />> + lrh0 |= qp->remote_ah_attr.sl << 4;<br />> + qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);<br />> + /* DEST LID */<br />> + qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);<br />> + qp->s_hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);<br />> + qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->ib_unit));<br />> + ohdr->bth[0] = cpu_to_be32(bth0);<br />> + ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);<br />> + ohdr->bth[2] = cpu_to_be32(qp->s_ack_psn & 0xFFFFFF);<br />> +<br />> + /*<br />> + * If we can send the ACK, clear the ACK state.<br />> + */<br />> + if (ipath_verbs_send(dev->ib_unit, hwords, (uint32_t *) &qp->s_hdr,<br />> + 0, NULL) == 0) {<br />> + qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;<br />> + dev->n_rc_qacks++;<br />> + }<br />> +}<br />> +<br />> +/*<br />> + * Back up the requester to resend the last un-ACKed request.<br />> + * The QP s_lock should be held.<br />> + */<br />> +static void ipath_restart_rc(struct ipath_qp *qp, u32 psn, struct ib_wc *wc)<br />> +{<br />> + struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);<br />> + struct ipath_ibdev *dev;<br />> + u32 n;<br />> +<br />> + /*<br />> + * If there are no requests pending, we are done.<br />> + */<br />> + if (cmp24(psn, qp->s_next_psn) >= 0 || qp->s_last == qp->s_tail)<br />> + goto done;<br />> +<br />> + if (qp->s_retry == 0) {<br />> + wc->wr_id = wqe->wr.wr_id;<br />> + wc->status = IB_WC_RETRY_EXC_ERR;<br />> + wc->opcode = wc_opcode[wqe->wr.opcode];<br />> + wc->vendor_err = 0;<br />> + wc->byte_len = 0;<br />> + wc->qp_num = qp->ibqp.qp_num;<br />> + wc->src_qp = qp->remote_qpn;<br />> + wc->pkey_index = 0;<br />> + wc->slid = qp->remote_ah_attr.dlid;<br />> + wc->sl = qp->remote_ah_attr.sl;<br />> + wc->dlid_path_bits = 0;<br />> + wc->port_num = 0;<br />> + ipath_sqerror_qp(qp, wc);<br />> + return;<br />> + }<br />> + qp->s_retry--;<br />> +<br />> + /*<br />> + * Remove the QP from the timeout queue.<br />> + * Note: it may already have been removed by ipath_ib_timer().<br />> + */<br />> + dev = to_idev(qp->ibqp.device);<br />> + spin_lock(&dev->pending_lock);<br />> + if (qp->timerwait.next != LIST_POISON1)<br />> + list_del(&qp->timerwait);<br />> + spin_unlock(&dev->pending_lock);<br />> +<br />> + if (wqe->wr.opcode == IB_WR_RDMA_READ)<br />> + dev->n_rc_resends++;<br />> + else<br />> + dev->n_rc_resends += (int)qp->s_psn - (int)psn;<br />> +<br />> + /*<br />> + * If we are starting the request from the beginning, let the<br />> + * normal send code handle initialization.<br />> + */<br />> + qp->s_cur = qp->s_last;<br />> + if (cmp24(psn, wqe->psn) <= 0) {<br />> + qp->s_state = IB_OPCODE_RC_SEND_LAST;<br />> + qp->s_psn = wqe->psn;<br />> + } else {<br />> + n = qp->s_cur;<br />> + for (;;) {<br />> + if (++n == qp->s_size)<br />> + n = 0;<br />> + if (n == qp->s_tail) {<br />> + if (cmp24(psn, qp->s_next_psn) >= 0) {<br />> + qp->s_cur = n;<br />> + wqe = get_swqe_ptr(qp, n);<br />> + }<br />> + break;<br />> + }<br />> + wqe = get_swqe_ptr(qp, n);<br />> + if (cmp24(psn, wqe->psn) < 0)<br />> + break;<br />> + qp->s_cur = n;<br />> + }<br />> + qp->s_psn = psn;<br />> +<br />> + /*<br />> + * Reset the state to restart in the middle of a request.<br />> + * Don't change the s_sge, s_cur_sge, or s_cur_size.<br />> + * See do_rc_send().<br />> + */<br />> + switch (wqe->wr.opcode) {<br />> + case IB_WR_SEND:<br />> + case IB_WR_SEND_WITH_IMM:<br />> + qp->s_state = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;<br />> + break;<br />> +<br />> + case IB_WR_RDMA_WRITE:<br />> + case IB_WR_RDMA_WRITE_WITH_IMM:<br />> + qp->s_state = IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST;<br />> + break;<br />> +<br />> + case IB_WR_RDMA_READ:<br />> + qp->s_state = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;<br />> + break;<br />> +<br />> + default:<br />> + /*<br />> + * This case shouldn't happen since its only<br />> + * one PSN per req.<br />> + */<br />> + qp->s_state = IB_OPCODE_RC_SEND_LAST;<br />> + }<br />> + }<br />> +<br />> +done:<br />> + tasklet_schedule(&qp->s_task);<br />> +}<br />> +<br />> +/*<br />> + * Handle RC and UC post sends.<br />> + */<br />> +static int ipath_post_rc_send(struct ipath_qp *qp, struct ib_send_wr *wr)<br />> +{<br />> + struct ipath_swqe *wqe;<br />> + unsigned long flags;<br />> + u32 next;<br />> + int i, j;<br />> + int acc;<br />> +<br />> + /*<br />> + * Don't allow RDMA reads or atomic operations on UC or<br />> + * undefined operations.<br />> + * Make sure buffer is large enough to hold the result for atomics.<br />> + */<br />> + if (qp->ibqp.qp_type == IB_QPT_UC) {<br />> + if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)<br />> + return -EINVAL;<br />> + } else if ((unsigned) wr->opcode > IB_WR_ATOMIC_FETCH_AND_ADD)<br />> + return -EINVAL;<br />> + else if (wr->opcode >= IB_WR_ATOMIC_CMP_AND_SWP &&<br />> + (wr->num_sge == 0 || wr->sg_list[0].length < sizeof(u64) ||<br />> + wr->sg_list[0].addr & 0x7))<br />> + return -EINVAL;<br />> +<br />> + /* IB spec says that num_sge == 0 is OK. */<br />> + if (wr->num_sge > qp->s_max_sge)<br />> + return -ENOMEM;<br />> +<br />> + spin_lock_irqsave(&qp->s_lock, flags);<br />> + next = qp->s_head + 1;<br />> + if (next >= qp->s_size)<br />> + next = 0;<br />> + if (next == qp->s_last) {<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> + return -EINVAL;<br />> + }<br />> +<br />> + wqe = get_swqe_ptr(qp, qp->s_head);<br />> + wqe->wr = *wr;<br />> + wqe->ssn = qp->s_ssn++;<br />> + wqe->sg_list[0].mr = NULL;<br />> + wqe->sg_list[0].vaddr = NULL;<br />> + wqe->sg_list[0].length = 0;<br />> + wqe->sg_list[0].sge_length = 0;<br />> + wqe->length = 0;<br />> + acc = wr->opcode >= IB_WR_RDMA_READ ? IB_ACCESS_LOCAL_WRITE : 0;<br />> + for (i = 0, j = 0; i < wr->num_sge; i++) {<br />> + if (to_ipd(qp->ibqp.pd)->user && wr->sg_list[i].lkey == 0) {<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> + return -EINVAL;<br />> + }<br />> + if (wr->sg_list[i].length == 0)<br />> + continue;<br />> + if (!ipath_lkey_ok(&to_idev(qp->ibqp.device)->lk_table,<br />> + &wqe->sg_list[j], &wr->sg_list[i], acc)) {<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> + return -EINVAL;<br />> + }<br />> + wqe->length += wr->sg_list[i].length;<br />> + j++;<br />> + }<br />> + wqe->wr.num_sge = j;<br />> + qp->s_head = next;<br />> + /*<br />> + * Wake up the send tasklet if the QP is not waiting<br />> + * for an RNR timeout.<br />> + */<br />> + next = qp->s_rnr_timeout;<br />> + spin_unlock_irqrestore(&qp->s_lock, flags);<br />> +<br />> + if (next == 0) {<br />> + if (qp->ibqp.qp_type == IB_QPT_UC)<br />> + do_uc_send((unsigned long) qp);<br />> + else<br />> + do_rc_send((unsigned long) qp);<br />> + }<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * Note that we actually send the data as it is posted instead of putting<br />> + * the request into a ring buffer. If we wanted to use a ring buffer,<br />> + * we would need to save a reference to the destination address in the SWQE.<br />> + */<br />> +static int ipath_post_ud_send(struct ipath_qp *qp, struct ib_send_wr *wr)<br />> +{<br />> + struct ipath_ibdev *dev = to_idev(qp->ibqp.device);<br />> + struct ipath_other_headers *ohdr;<br />> + struct ib_ah_attr *ah_attr;<br />> + struct ipath_sge_state ss;<br />> + struct ipath_sge *sg_list;<br />> + struct ib_wc wc;<br />> + u32 hwords;<br />> + u32 nwords;<br />> + u32 len;<br />> + u32 extra_bytes;<br />> + u32 bth0;<br />> + u16 lrh0;<br />> + u16 lid;<br />> + int i;<br />> +<br />> + if (!(state_ops[qp->state] & IPATH_PROCESS_SEND_OK))<br />> + return 0;<br />> +<br />> + /* IB spec says that num_sge == 0 is OK. */<br />> + if (wr->num_sge > qp->s_max_sge)<br />> + return -EINVAL;<br />> +<br />> + if (wr->num_sge > 1) {<br />> + sg_list = kmalloc((qp->s_max_sge - 1) * sizeof(*sg_list),<br />> + GFP_ATOMIC);<br />> + if (!sg_list)<br />> + return -ENOMEM;<br />> + } else<br />> + sg_list = NULL;<br />> +<br />> + /* Check the buffer to send. */<br />> + ss.sg_list = sg_list;<br />> + ss.sge.mr = NULL;<br />> + ss.sge.vaddr = NULL;<br />> + ss.sge.length = 0;<br />> + ss.sge.sge_length = 0;<br />> + ss.num_sge = 0;<br />> + len = 0;<br />> + for (i = 0; i < wr->num_sge; i++) {<br />> + /* Check LKEY */<br />> + if (to_ipd(qp->ibqp.pd)->user && wr->sg_list[i].lkey == 0)<br />> + return -EINVAL;<br />> +<br />> + if (wr->sg_list[i].length == 0)<br />> + continue;<br />> + if (!ipath_lkey_ok(&dev->lk_table, ss.num_sge ?<br />> + sg_list + ss.num_sge : &ss.sge,<br />> + &wr->sg_list[i], 0)) {<br />> + return -EINVAL;<br />> + }<br />> + len += wr->sg_list[i].length;<br />> + ss.num_sge++;<br />> + }<br />> + extra_bytes = (4 - len) & 3;<br />> + nwords = (len + extra_bytes) >> 2;<br />> +<br />> + /* Construct the header. */<br />> + ah_attr = &to_iah(wr->wr.ud.ah)->attr;<br />> + if (ah_attr->dlid >= 0xC000 && ah_attr->dlid < 0xFFFF)<br />> + dev->n_multicast_xmit++;<br />> + if (unlikely(ah_attr->dlid == ipath_layer_get_lid(dev->ib_unit))) {<br />> + /* Pass in an uninitialized ib_wc to save stack space. */<br />> + ipath_ud_loopback(qp, &ss, len, wr, &wc);<br />> + goto done;<br />> + }<br />> + if (ah_attr->ah_flags & IB_AH_GRH) {<br />> + /* Header size in 32-bit words. */<br />> + hwords = 17;<br />> + lrh0 = IPS_LRH_GRH;<br />> + ohdr = &qp->s_hdr.u.l.oth;<br />> + qp->s_hdr.u.l.grh.version_tclass_flow =<br />> + cpu_to_be32((6 << 28) |<br />> + (ah_attr->grh.traffic_class << 20) |<br />> + ah_attr->grh.flow_label);<br />> + qp->s_hdr.u.l.grh.paylen =<br />> + cpu_to_be16(((wr->opcode ==<br />> + IB_WR_SEND_WITH_IMM ? 6 : 5) + nwords +<br />> + SIZE_OF_CRC) << 2);<br />> + qp->s_hdr.u.l.grh.next_hdr = 0x1B;<br />> + qp->s_hdr.u.l.grh.hop_limit = ah_attr->grh.hop_limit;<br />> + /* The SGID is 32-bit aligned. */<br />> + qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix;<br />> + qp->s_hdr.u.l.grh.sgid.global.interface_id =<br />> + ipath_layer_get_guid(dev->ib_unit);<br />> + qp->s_hdr.u.l.grh.dgid = ah_attr->grh.dgid;<br />> + /*<br />> + * Don't worry about sending to locally attached<br />> + * multicast QPs. It is unspecified by the spec. what happens.<br />> + */<br />> + } else {<br />> + /* Header size in 32-bit words. */<br />> + hwords = 7;<br />> + lrh0 = IPS_LRH_BTH;<br />> + ohdr = &qp->s_hdr.u.oth;<br />> + }<br />> + if (wr->opcode == IB_WR_SEND_WITH_IMM) {<br />> + ohdr->u.ud.imm_data = wr->imm_data;<br />> + wc.imm_data = wr->imm_data;<br />> + hwords += 1;<br />> + bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;<br />> + } else if (wr->opcode == IB_WR_SEND) {<br />> + wc.imm_data = 0;<br />> + bth0 = IB_OPCODE_UD_SEND_ONLY << 24;<br />> + } else<br />> + return -EINVAL;<br />> + lrh0 |= ah_attr->sl << 4;<br />> + if (qp->ibqp.qp_type == IB_QPT_SMI)<br />> + lrh0 |= 0xF000; /* Set VL */<br />> + qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);<br />> + qp->s_hdr.lrh[1] = cpu_to_be16(ah_attr->dlid); /* DEST LID */<br />> + qp->s_hdr.lrh[2] = cpu_to_be16(hwords + nwords + SIZE_OF_CRC);<br />> + lid = ipath_layer_get_lid(dev->ib_unit);<br />> + qp->s_hdr.lrh[3] = lid ? cpu_to_be16(lid) : IB_LID_PERMISSIVE;<br />> + if (wr->send_flags & IB_SEND_SOLICITED)<br />> + bth0 |= 1 << 23;<br />> + bth0 |= extra_bytes << 20;<br />> + bth0 |= qp->ibqp.qp_type == IB_QPT_SMI ? IPS_DEFAULT_P_KEY :<br />> + ipath_layer_get_pkey(dev->ib_unit, qp->s_pkey_index);<br />> + ohdr->bth[0] = cpu_to_be32(bth0);<br />> + ohdr->bth[1] = cpu_to_be32(wr->wr.ud.remote_qpn);<br />> + /* XXX Could lose a PSN count but not worth locking */<br />> + ohdr->bth[2] = cpu_to_be32(qp->s_psn++ & 0xFFFFFF);<br />> + /*<br />> + * Qkeys with the high order bit set mean use the<br />> + * qkey from the QP context instead of the WR.<br />> + */<br />> + ohdr->u.ud.deth[0] = cpu_to_be32((int)wr->wr.ud.remote_qkey < 0 ?<br />> + qp->qkey : wr->wr.ud.remote_qkey);<br />> + ohdr->u.ud.deth[1] = cpu_to_be32(qp->ibqp.qp_num);<br />> + if (ipath_verbs_send(dev->ib_unit, hwords, (uint32_t *) &qp->s_hdr,<br />> + len, &ss))<br />> + dev->n_no_piobuf++;<br />> +<br />> +done:<br />> + /* Queue the completion status entry. */<br />> + if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &qp->s_flags) ||<br />> + (wr->send_flags & IB_SEND_SIGNALED)) {<br />> + wc.wr_id = wr->wr_id;<br />> + wc.status = IB_WC_SUCCESS;<br />> + wc.vendor_err = 0;<br />> + wc.opcode = IB_WC_SEND;<br />> + wc.byte_len = len;<br />> + wc.qp_num = qp->ibqp.qp_num;<br />> + wc.src_qp = 0;<br />> + wc.wc_flags = 0;<br />> + /* XXX initialize other fields? */<br />> + ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 0);<br />> + }<br />> + kfree(sg_list);<br />> +<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * This may be called from interrupt context.<br />> + */<br />> +static int ipath_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,<br />> + struct ib_send_wr **bad_wr)<br />> +{<br />> + struct ipath_qp *qp = to_iqp(ibqp);<br />> + int err = 0;<br />> +<br />> + /* Check that state is OK to post send. */<br />> + if (!(state_ops[qp->state] & IPATH_POST_SEND_OK)) {<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> +<br />> + for (; wr; wr = wr->next) {<br />> + switch (qp->ibqp.qp_type) {<br />> + case IB_QPT_UC:<br />> + case IB_QPT_RC:<br />> + err = ipath_post_rc_send(qp, wr);<br />> + break;<br />> +<br />> + case IB_QPT_SMI:<br />> + case IB_QPT_GSI:<br />> + case IB_QPT_UD:<br />> + err = ipath_post_ud_send(qp, wr);<br />> + break;<br />> +<br />> + default:<br />> + err = -EINVAL;<br />> + }<br />> + if (err) {<br />> + *bad_wr = wr;<br />> + break;<br />> + }<br />> + }<br />> + return err;<br />> +}<br />> +<br />> +/*<br />> + * This may be called from interrupt context.<br />> + */<br />> +static int ipath_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,<br />> + struct ib_recv_wr **bad_wr)<br />> +{<br />> + struct ipath_qp *qp = to_iqp(ibqp);<br />> + unsigned long flags;<br />> +<br />> + /* Check that state is OK to post receive. */<br />> + if (!(state_ops[qp->state] & IPATH_POST_RECV_OK)) {<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> +<br />> + for (; wr; wr = wr->next) {<br />> + struct ipath_rwqe *wqe;<br />> + u32 next;<br />> + int i, j;<br />> +<br />> + if (wr->num_sge > qp->r_rq.max_sge) {<br />> + *bad_wr = wr;<br />> + return -ENOMEM;<br />> + }<br />> +<br />> + spin_lock_irqsave(&qp->r_rq.lock, flags);<br />> + next = qp->r_rq.head + 1;<br />> + if (next >= qp->r_rq.size)<br />> + next = 0;<br />> + if (next == qp->r_rq.tail) {<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -ENOMEM;<br />> + }<br />> +<br />> + wqe = get_rwqe_ptr(&qp->r_rq, qp->r_rq.head);<br />> + wqe->wr_id = wr->wr_id;<br />> + wqe->sg_list[0].mr = NULL;<br />> + wqe->sg_list[0].vaddr = NULL;<br />> + wqe->sg_list[0].length = 0;<br />> + wqe->sg_list[0].sge_length = 0;<br />> + wqe->length = 0;<br />> + for (i = 0, j = 0; i < wr->num_sge; i++) {<br />> + /* Check LKEY */<br />> + if (to_ipd(qp->ibqp.pd)->user &&<br />> + wr->sg_list[i].lkey == 0) {<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> + if (wr->sg_list[i].length == 0)<br />> + continue;<br />> + if (!ipath_lkey_ok(&to_idev(qp->ibqp.device)->lk_table,<br />> + &wqe->sg_list[j], &wr->sg_list[i],<br />> + IB_ACCESS_LOCAL_WRITE)) {<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> + wqe->length += wr->sg_list[i].length;<br />> + j++;<br />> + }<br />> + wqe->num_sge = j;<br />> + qp->r_rq.head = next;<br />> + spin_unlock_irqrestore(&qp->r_rq.lock, flags);<br />> + }<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * This may be called from interrupt context.<br />> + */<br />> +static int ipath_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,<br />> + struct ib_recv_wr **bad_wr)<br />> +{<br />> + struct ipath_srq *srq = to_isrq(ibsrq);<br />> + struct ipath_ibdev *dev = to_idev(ibsrq->device);<br />> + unsigned long flags;<br />> +<br />> + for (; wr; wr = wr->next) {<br />> + struct ipath_rwqe *wqe;<br />> + u32 next;<br />> + int i, j;<br />> +<br />> + if (wr->num_sge > srq->rq.max_sge) {<br />> + *bad_wr = wr;<br />> + return -ENOMEM;<br />> + }<br />> +<br />> + spin_lock_irqsave(&srq->rq.lock, flags);<br />> + next = srq->rq.head + 1;<br />> + if (next >= srq->rq.size)<br />> + next = 0;<br />> + if (next == srq->rq.tail) {<br />> + spin_unlock_irqrestore(&srq->rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -ENOMEM;<br />> + }<br />> +<br />> + wqe = get_rwqe_ptr(&srq->rq, srq->rq.head);<br />> + wqe->wr_id = wr->wr_id;<br />> + wqe->sg_list[0].mr = NULL;<br />> + wqe->sg_list[0].vaddr = NULL;<br />> + wqe->sg_list[0].length = 0;<br />> + wqe->sg_list[0].sge_length = 0;<br />> + wqe->length = 0;<br />> + for (i = 0, j = 0; i < wr->num_sge; i++) {<br />> + /* Check LKEY */<br />> + if (to_ipd(srq->ibsrq.pd)->user &&<br />> + wr->sg_list[i].lkey == 0) {<br />> + spin_unlock_irqrestore(&srq->rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> + if (wr->sg_list[i].length == 0)<br />> + continue;<br />> + if (!ipath_lkey_ok(&dev->lk_table,<br />> + &wqe->sg_list[j], &wr->sg_list[i],<br />> + IB_ACCESS_LOCAL_WRITE)) {<br />> + spin_unlock_irqrestore(&srq->rq.lock, flags);<br />> + *bad_wr = wr;<br />> + return -EINVAL;<br />> + }<br />> + wqe->length += wr->sg_list[i].length;<br />> + j++;<br />> + }<br />> + wqe->num_sge = j;<br />> + srq->rq.head = next;<br />> + spin_unlock_irqrestore(&srq->rq.lock, flags);<br />> + }<br />> + return 0;<br />> +}<br />> +<br />> +/*<br />> + * This is called from ipath_qp_rcv() to process an incomming UD packet<br />> + * for the given QP.<br />> + * Called at interrupt level.<br />> + */<br />> +static void ipath_ud_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,<br />> + int has_grh, void *data, u32 tlen, struct ipath_qp *qp)<br />> +{<br />> + struct ipath_other_headers *ohdr;<br />> + int opcode;<br />> + u32 hdrsize;<br />> + u32 pad;<br />> + unsigned long flags;<br />> + struct ib_wc wc;<br />> + u32 qkey;<br />> + u32 src_qp;<br />> + struct ipath_rq *rq;<br />> + struct ipath_srq *srq;<br />> + struct ipath_rwqe *wqe;<br />> +<br />> + /* Check for GRH */<br />> + if (!has_grh) {<br />> + ohdr = &hdr->u.oth;<br />> + hdrsize = 8 + 12 + 8; /* LRH + BTH + DETH */<br />> + qkey = be32_to_cpu(ohdr->u.ud.deth[0]);<br />> + src_qp = be32_to_cpu(ohdr->u.ud.deth[1]);<br />> + } else {<br />> + ohdr = &hdr->u.l.oth;<br />> + hdrsize = 8 + 40 + 12 + 8; /* LRH + GRH + BTH + DETH */<br />> + /*<br />> + * The header with GRH is 68 bytes and the<br />> + * core driver sets the eager header buffer<br />> + * size to 56 bytes so the last 12 bytes of<br />> + * the IB header is in the data buffer.<br />> + */<br />> + qkey = be32_to_cpu(((u32 *) data)[1]);<br />> + src_qp = be32_to_cpu(((u32 *) data)[2]);<br />> + data += 12;<br />> + }<br />> + src_qp &= 0xFFFFFF;<br />> +<br />> + /* Check that the qkey matches. */<br />> + if (unlikely(qkey != qp->qkey)) {<br />> + /* XXX OK to lose a count once in a while. */<br />> + dev->qkey_violations++;<br />> + dev->n_pkt_drops++;<br />> + return;<br />> + }<br />> +<br />> + /* Get the number of bytes the message was padded by. */<br />> + pad = (ohdr->bth[0] >> 12) & 3;<br />> + if (unlikely(tlen < (hdrsize + pad + 4))) {<br />> + /* Drop incomplete packets. */<br />> + dev->n_pkt_drops++;<br />> + return;<br />> + }<br />> +<br />> + /*<br />> + * A GRH is expected to preceed the data even if not<br />> + * present on the wire.<br />> + */<br />> + wc.byte_len = tlen - (hdrsize + pad + 4) + sizeof(struct ib_grh);<br />> +<br />> + /*<br />> + * The opcode is in the low byte when its in network order<br />> + * (top byte when in host order).<br />> + */<br />> + opcode = *(u8 *) (&ohdr->bth[0]);<br />> + if (opcode == IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE) {<br />> + if (has_grh) {<br />> + wc.imm_data = *(u32 *) data;<br />> + data += sizeof(u32);<br />> + } else<br />> + wc.imm_data = ohdr->u.ud.imm_data;<br />> + wc.wc_flags = IB_WC_WITH_IMM;<br />> + hdrsize += sizeof(u32);<br />> + } else if (opcode == IB_OPCODE_UD_SEND_ONLY) {<br />> + wc.imm_data = 0;<br />> + wc.wc_flags = 0;<br />> + } else {<br />> + dev->n_pkt_drops++;<br />> + return;<br />> + }<br />> +<br />> + /*<br />> + * Get the next work request entry to find where to put the data.<br />> + * Note that it is safe to drop the lock after changing rq->tail<br />> + * since ipath_post_receive() won't fill the empty slot.<br />> + */<br />> + if (qp->ibqp.srq) {<br />> + srq = to_isrq(qp->ibqp.srq);<br />> + rq = &srq->rq;<br />> + } else {<br />> + srq = NULL;<br />> + rq = &qp->r_rq;<br />> + }<br />> + spin_lock_irqsave(&rq->lock, flags);<br />> + if (rq->tail == rq->head) {<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + dev->n_pkt_drops++;<br />> + return;<br />> + }<br />> + /* Silently drop packets which are too big. */<br />> + wqe = get_rwqe_ptr(rq, rq->tail);<br />> + if (wc.byte_len > wqe->length) {<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + dev->n_pkt_drops++;<br />> + return;<br />> + }<br />> + wc.wr_id = wqe->wr_id;<br />> + qp->r_sge.sge = wqe->sg_list[0];<br />> + qp->r_sge.sg_list = wqe->sg_list + 1;<br />> + qp->r_sge.num_sge = wqe->num_sge;<br />> + if (++rq->tail >= rq->size)<br />> + rq->tail = 0;<br />> + if (srq && srq->ibsrq.event_handler) {<br />> + u32 n;<br />> +<br />> + if (rq->head < rq->tail)<br />> + n = rq->size + rq->head - rq->tail;<br />> + else<br />> + n = rq->head - rq->tail;<br />> + if (n < srq->limit) {<br />> + struct ib_event ev;<br />> +<br />> + srq->limit = 0;<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + ev.device = qp->ibqp.device;<br />> + ev.element.srq = qp->ibqp.srq;<br />> + ev.event = IB_EVENT_SRQ_LIMIT_REACHED;<br />> + srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context);<br />> + } else<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + } else<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + if (has_grh) {<br />> + copy_sge(&qp->r_sge, &hdr->u.l.grh, sizeof(struct ib_grh));<br />> + wc.wc_flags |= IB_WC_GRH;<br />> + } else<br />> + skip_sge(&qp->r_sge, sizeof(struct ib_grh));<br />> + copy_sge(&qp->r_sge, data, wc.byte_len - sizeof(struct ib_grh));<br />> + wc.status = IB_WC_SUCCESS;<br />> + wc.opcode = IB_WC_RECV;<br />> + wc.vendor_err = 0;<br />> + wc.qp_num = qp->ibqp.qp_num;<br />> + wc.src_qp = src_qp;<br />> + /* XXX do we know which pkey matched? Only needed for GSI. */<br />> + wc.pkey_index = 0;<br />> + wc.slid = be16_to_cpu(hdr->lrh[3]);<br />> + wc.sl = (be16_to_cpu(hdr->lrh[0]) >> 4) & 0xF;<br />> + wc.dlid_path_bits = 0;<br />> + /* Signal completion event if the solicited bit is set. */<br />> + ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,<br />> + ohdr->bth[0] & __constant_cpu_to_be32(1 << 23));<br />> +}<br />> +<br />> +/*<br />> + * This is called from ipath_post_ud_send() to forward a WQE addressed<br />> + * to the same HCA.<br />> + */<br />> +static void ipath_ud_loopback(struct ipath_qp *sqp, struct ipath_sge_state *ss,<br />> + u32 length, struct ib_send_wr *wr,<br />> + struct ib_wc *wc)<br />> +{<br />> + struct ipath_ibdev *dev = to_idev(sqp->ibqp.device);<br />> + struct ipath_qp *qp;<br />> + struct ib_ah_attr *ah_attr;<br />> + unsigned long flags;<br />> + struct ipath_rq *rq;<br />> + struct ipath_srq *srq;<br />> + struct ipath_sge_state rsge;<br />> + struct ipath_sge *sge;<br />> + struct ipath_rwqe *wqe;<br />> +<br />> + qp = ipath_lookup_qpn(&dev->qp_table, wr->wr.ud.remote_qpn);<br />> + if (!qp)<br />> + return;<br />> +<br />> + /* Check that the qkey matches. */<br />> + if (unlikely(wr->wr.ud.remote_qkey != qp->qkey)) {<br />> + /* XXX OK to lose a count once in a while. */<br />> + dev->qkey_violations++;<br />> + dev->n_pkt_drops++;<br />> + goto done;<br />> + }<br />> +<br />> + /*<br />> + * A GRH is expected to preceed the data even if not<br />> + * present on the wire.<br />> + */<br />> + wc->byte_len = length + sizeof(struct ib_grh);<br />> +<br />> + if (wr->opcode == IB_WR_SEND_WITH_IMM) {<br />> + wc->wc_flags = IB_WC_WITH_IMM;<br />> + wc->imm_data = wr->imm_data;<br />> + } else {<br />> + wc->wc_flags = 0;<br />> + wc->imm_data = 0;<br />> + }<br />> +<br />> + /*<br />> + * Get the next work request entry to find where to put the data.<br />> + * Note that it is safe to drop the lock after changing rq->tail<br />> + * since ipath_post_receive() won't fill the empty slot.<br />> + */<br />> + if (qp->ibqp.srq) {<br />> + srq = to_isrq(qp->ibqp.srq);<br />> + rq = &srq->rq;<br />> + } else {<br />> + srq = NULL;<br />> + rq = &qp->r_rq;<br />> + }<br />> + spin_lock_irqsave(&rq->lock, flags);<br />> + if (rq->tail == rq->head) {<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + dev->n_pkt_drops++;<br />> + goto done;<br />> + }<br />> + /* Silently drop packets which are too big. */<br />> + wqe = get_rwqe_ptr(rq, rq->tail);<br />> + if (wc->byte_len > wqe->length) {<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + dev->n_pkt_drops++;<br />> + goto done;<br />> + }<br />> + wc->wr_id = wqe->wr_id;<br />> + rsge.sge = wqe->sg_list[0];<br />> + rsge.sg_list = wqe->sg_list + 1;<br />> + rsge.num_sge = wqe->num_sge;<br />> + if (++rq->tail >= rq->size)<br />> + rq->tail = 0;<br />> + if (srq && srq->ibsrq.event_handler) {<br />> + u32 n;<br />> +<br />> + if (rq->head < rq->tail)<br />> + n = rq->size + rq->head - rq->tail;<br />> + else<br />> + n = rq->head - rq->tail;<br />> + if (n < srq->limit) {<br />> + struct ib_event ev;<br />> +<br />> + srq->limit = 0;<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + ev.device = qp->ibqp.device;<br />> + ev.element.srq = qp->ibqp.srq;<br />> + ev.event = IB_EVENT_SRQ_LIMIT_REACHED;<br />> + srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context);<br />> + } else<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + } else<br />> + spin_unlock_irqrestore(&rq->lock, flags);<br />> + ah_attr = &to_iah(wr->wr.ud.ah)->attr;<br />> + if (ah_attr->ah_flags & IB_AH_GRH) {<br />> + copy_sge(&rsge, &ah_attr->grh, sizeof(struct ib_grh));<br />> + wc->wc_flags |= IB_WC_GRH;<br />> + } else<br />> + skip_sge(&rsge, sizeof(struct ib_grh));<br />> + sge = &ss->sge;<br />> + while (length) {<br />> + u32 len = sge->length;<br />> +<br />> + if (len > length)<br />> + len = length;<br />> + BUG_ON(len == 0);<br />> + copy_sge(&rsge, sge->vaddr, len);<br />> + sge->vaddr += len;<br />> + sge->length -= len;<br />> + sge->sge_length -= len;<br />> + if (sge->sge_length == 0) {<br />> + if (--ss->num_sge)<br />> + *sge = *ss->sg_list++;<br />> + } else if (sge->length == 0 && sge->mr != NULL) {<br />> + if (++sge->n >= IPATH_SEGSZ) {<br />> + if (++sge->m >= sge->mr->mapsz)<br />> + break;<br />> + sge->n = 0;<br />> + }<br />> + sge->vaddr = sge->mr->map[sge->m]->segs[sge->n].vaddr;<br />> + sge->length = sge->mr->map[sge->m]->segs[sge->n].length;<br />> + }<br />> + length -= len;<br />> + }<br />> + wc->status = IB_WC_SUCCESS;<br />> + wc->opcode = IB_WC_RECV;<br />> + wc->vendor_err = 0;<br />> + wc->qp_num = qp->ibqp.qp_num;<br />> + wc->src_qp = sqp->ibqp.qp_num;<br />> + /* XXX do we know which pkey matched? Only needed for GSI. */<br />> + wc->pkey_index = 0;<br />> + wc->slid = ipath_layer_get_lid(dev->ib_unit);<br />> + wc->sl = ah_attr->sl;<br />> + wc->dlid_path_bits = 0;<br />> + /* Signal completion event if the solicited bit is set. */<br />> + ipath_cq_enter(to_icq(qp->ibqp.recv_cq), wc,<br />> + wr->send_flags & IB_SEND_SOLICITED);<br />> +<br />> +done:<br />> + if (atomic_dec_and_test(&qp->refcount))<br />> + wake_up(&qp->wait);<br />> +}<br />> +<br />> +/*<br />> + * Copy the next RWQE into the QP's RWQE.<br />> + * Return zero if no RWQE is available.<br />> + * Called at interrupt level with the QP r_rq.lock held.<br />> + */<br />> +static int get_rwqe(struct ipath_qp *qp, int wr_id_only)<br />> +{<br />> + struct ipath_rq *rq;<br />> + struct ipath_srq *srq;<br />> + struct ipath_rwqe *wqe;<br />> +<br />> + if (!qp->ibqp.srq) {<br />> + rq = &qp->r_rq;<br />> + if (unlikely(rq->tail == rq->head))<br />> + return 0;<br />> + wqe = get_rwqe_ptr(rq, rq->tail);<br />> + qp->r_wr_id = wqe->wr_id;<br />> + if (!wr_id_only) {<br />> + qp->r_sge.sge = wqe->sg_list[0];<br />> + qp->r_sge.sg_list = wqe->sg_list + 1;<br />> + qp->r_sge.num_sge = wqe->num_sge;<br />> + qp->r_len = wqe->length;<br />> + }<br />> + if (++rq->tail >= rq->size)<br />> + rq->tail = 0;<br />> + return 1;<br />> + }<br />> +<br />> + srq = to_isrq(qp->ibqp.srq);<br />> + rq = &srq->rq;<br />> + spin_lock(&rq->lock);<br />> + if (unlikely(rq->tail == rq->head)) {<br />> + spin_unlock(&rq->lock);<br />> + return 0;<br />> + }<br />> + wqe = get_rwqe_ptr(rq, rq->tail);<br />> + qp->r_wr_id = wqe->wr_id;<br />> + if (!wr_id_only) {<br />> + qp->r_sge.sge = wqe->sg_list[0];<br />> + qp->r_sge.sg_list = wqe->sg_list + 1;<br />> + qp->r_sge.num_sge = wqe->num_sge;<br />> + qp->r_len = wqe->length;<br />> + }<br />> + if (++rq->tail >= rq->size)<br />> + rq->tail = 0;<br />> + if (srq->ibsrq.event_handler) {<br />> + struct ib_event ev;<br />> + u32 n;<br />> +<br />> + if (rq->head < rq->tail)<br />> + n = rq->size + rq->head - rq->tail;<br />> + else<br />> + n = rq->head - rq->tail;<br />> + if (n < srq->limit) {<br />> + srq->limit = 0;<br />> + spin_unlock(&rq->lock);<br />> + ev.device = qp->ibqp.device;<br />> + ev.element.srq = qp->ibqp.srq;<br />> + ev.event = IB_EVENT_SRQ_LIMIT_REACHED;<br />> + srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context);<br />> + } else<br />> + spin_unlock(&rq->lock);<br />> + } else<br />> + spin_unlock(&rq->lock);<br />> + return 1;<br />> +}<br />> -- <br />> 0.99.9n<br />> -<br />> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />> the body of a message to majordomo@vger.kernel.org<br />> More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />> Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br />> <br />> <br />> <br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-12-18 21:01 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>