CINXE.COM
SGT-Light Spectral Graph Transducer
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD><TITLE>SGT-Light Spectral Graph Transducer</TITLE> <META http-equiv=Content-Type content="text/html; charset=windows-1252"> <META content="Microsoft FrontPage 4.0" name=GENERATOR> <META content=8.0.3514 name=Version> <META content=11/26/96 name=Date> <META content="C:\Programme\Microsoft Office\Office\HTML.DOT" name=Template></HEAD> <BODY vLink=#800080 link=#0000ff bgColor=#ffffff> <div align="justify"> <TABLE cellSpacing=0 cellPadding=5 border=0 width="100%"> <TBODY> <TR> <TD vAlign=top nowrap> <H2><IMG height=80 src="http://www.joachims.org/images/culogo_125.gif" width=80></H2></TD> <TD vAlign=top> <H1 align=center>SGT<I><SUP>light</SUP> </H1></I> <H1 align=center>Spectral Graph Transducer</H1><FONT color=#000000> <P align=center>Author: </FONT><A href="http://www.joachims.org/" target=_top>Thorsten Joachims</A><FONT color=#000000> <</FONT><A href="mailto:thorsten@joachims.org">thorsten@joachims.org</A><FONT color=#000000>> <BR></FONT><A href="http://www.cornell.edu/" target=_top>Cornell University</A><FONT color=#000000> <BR></FONT><A href="http://www.cs.cornell.edu/" target=_top>Department of Computer Science</A> </P> <FONT color=#000000> <P align=center>Version: 1.00 <BR>Date: 20.08.2003</FONT></P></TD> <TD vAlign=top nowrap> <H2><IMG height=80 src="http://www.joachims.org/images/culogo_125.gif" width=80></H2></TD></TR></TBODY></TABLE> </div> <H2>Description</H2><FONT color=#000000> <P>SGT<I><SUP>light</I></SUP> is an implementation of a Spectral Graph Transducer (SGT) [Joachims, 2003] in C using Matlab libraries. The SGT is a method for transductive learning. It solves a normalized-cut (or ratio-cut) problem with additional constraints for the labeled examples using spectral methods. The approach is efficient enough to handle datasets with several ten-thousands of examples. </P> </FONT> <H2>Source Code and Binaries</H2><FONT color=#000000> <P>The program is free for scientific use. Please contact me, if you are planning to use the software for commercial purposes. The software must not be distributed without prior permission of the author. If you use SGT<I><SUP>light</I></SUP> in your scientific work, please cite as </P> <UL> <LI>T. Joachims, Transductive Learning via Spectral Graph Partitioning, International Conference on Machine Learning (ICML), 2003. <BR></FONT>[<a href="http://www.joachims.org/publications/joachims_03a.pdf">PDF</a>] [<a href="http://www.joachims.org/publications/joachims_03a.ps.gz">Postscript (gz)</a>] </LI></UL><FONT color=#000000> <P>I would also appreciate, if you sent me (a link to) your papers so that I can learn about your research. The implementation was developed on Windows 2000 with mcc and Matlab R12, but compiles also on Linux and Solaris. The source code is available at the following location: </P> <DIR></FONT><P><a href="https://download.joachims.org/sgt_light/current/sgt_light.tar.gz" target="_top">https://download.joachims.org/sgt_light/current/sgt_light.tar.gz</a></P></DIR> <P>If you just want the binaries, you can download them for the following systems:</P> <UL> <LI>Solaris: <a href="https://download.joachims.org/sgt_light/current/sgt_light_solaris.tar.gz" target="_top">https://download.joachims.org/sgt_light/current/sgt_light_solaris.tar.gz</a> <LI>Windows: <a href="https://download.joachims.org/sgt_light/current/sgt_light_windows.tar.gz" target="_top">https://download.joachims.org/sgt_light/current/sgt_light_windows.tar.gz</a> <LI>Linux: <a href="https://download.joachims.org/sgt_light/current/sgt_light_linux.tar.gz" target="_top">https://download.joachims.org/sgt_light/current/sgt_light_linux.tar.gz</a> </LI></UL> <P>Please send me email<FONT color=#000000> <<a href="mailto:tj@cs.cornell.edu">tj@cs.cornell.edu</a>> and let me know that you got SGT<I><SUP>light</SUP></I>. I will put you on my mailing list to inform you about new versions and bug-fixes.</FONT> </P> <H2>Installation (Source Code)</H2><FONT color=#000000> <P>To compile and install the source code of SGT<I><SUP>light</I></SUP> you need to download <TT>sgt_light.tar.gz</TT>. Unpack it with </P> <DIR><TT> <P>gunzip -c sgt_light.tar.gz | tar xvf -</P></DIR></TT> <P>You compile the source code with </P> <DIR><TT> <P>make</P></DIR></TT> <P>which compiles all executables (type <TT>make info</TT> for additional options). Note that you have to have Matlab installed with the mcc compiler. You will have five executables: </P> <DIR><TT> <P>sgt_buildknngraph (creates a k-NN graph from feature vectors)</TT> <BR><TT>sgt_decompgraph (compute first d eigenvectors of graph Laplacian)<br> sgt_predict (predict labels of test examples)<br> sgt_traintestsplit (make random test/training split)<br> sgt_addgraphs (adds adjacency matrices of two graphs)</TT></P></DIR><P>To run the executables, you need the Matlab runtime library, which is available as the compressed archive <tt>mglinstaller</tt>. The runtime library is delivered with Matlab (see <a href="http://www.mathworks.com/access/helpdesk/help/toolbox/compiler/ch04st20.shtml#984826">here</a>). Copy <tt>mglinstaller</tt> into your sgt-light directory and call <DIR><TT> <P>mglinstaller</P></DIR></TT> which creates two subdirectories <tt>bin</tt> and <tt>toolbox</tt>. See <a href="http://www.mathworks.com/access/helpdesk/help/toolbox/compiler/ch04st20.shtml#984826">here</a> for more details on how to obtain and install this library. Finally, you must extend your search path by executing <DIR> <P> Solaris: <TT>setenv LD_LIBRARY_PATH ./bin/sol2:$PATH</TT><BR> Windows: <TT>set PATH=.\bin\win32;%PATH%</TT><BR> Linux: <TT>setenv LD_LIBRARY_PATH ./bin/glnx86:$PATH</TT></DIR></TT> to make sure the library is accessible. For Matlab R13 on Linux and Solaris, there seems to be a bug in the math library and <TT>sgt_decompgraph</TT> might not find some mex files. To fix the problem, execute <DIR><TT> <P>cp toolbox/matlab/datatypes/cellfun.* .<br> cp toolbox/matlab/sparfun/arpackc.* .</P> </DIR></TT> </P></FONT> <H2>Installation (Binaries)</H2><FONT color=#000000> <P>To install one of the binary archives of SGT<I><SUP>light</I></SUP> (e.g. <TT>sgt_light_solaris.tar.gz</TT>) you need to download it and unpack it with </P> <TT> <blockquote> <P>gunzip -c sgt_light_solaris.tar.gz | tar xvf -<br> gunzip -c sgt_light_windows.tar.gz | tar xvf -<br> gunzip -c sgt_light_linux.tar.gz | tar xvf -</P> </blockquote> </TT> You will have five executables: </P> <DIR><TT> <P>sgt_buildknngraph (creates a k-NN graph from feature vectors)</TT> <BR><TT>sgt_decompgraph (compute first d eigenvectors of graph Laplacian)<br> sgt_predict (predict labels of test examples)<br> sgt_traintestsplit (make random test/training split)<br> sgt_addgraphs (adds adjacency matrices of two graphs)</TT></P></DIR> <P>To run the executables, you need the Matlab runtime libraries, which are included in the binary archive (also, see <a href="http://www.mathworks.com/access/helpdesk/help/toolbox/compiler/ch04st20.shtml#984826">here</a> for details on the Matlab libraries). Finally, you must extend your search path by executing <DIR> <P> Solaris: <TT>setenv LD_LIBRARY_PATH ./bin/sol2:$PATH</TT><BR> Windows: <TT>set PATH=.\bin\win32;%PATH%</TT><BR> Linux: <TT>setenv LD_LIBRARY_PATH ./bin/glnx86:$PATH</TT></DIR></TT> </P> to make sure the library is accessible. </FONT> <H2>How to use</H2><FONT color=#000000> <P>This section explains how to use the SGT<I><SUP>light</I></SUP> software. The background is given in [Joachims, 2003]. </P> <P>SGT<I><SUP>light</I></SUP> consists of three main modules and two additional programs for convenience. The main modules are, first, a program (<TT>sgt_buildknngraph</TT>) for constructing the k nearest neighbor graph from data in the form of feature vectors given in SVM<I><SUP>light</SUP> </I> format (see <a href="http://svmlight.joachims.org">here</a>). Second, a program (<TT>sgt_decompgraph</TT>) for computing the d smallest eigenvectors of the Laplacian of a graph, and third a module (<TT>sgt_predict</TT>) for computing the acutal predictions for a given training and test set. Furthermore, there is a program (<TT>sgt_traintestsplit</TT>) for computing a random test/training split and a program (<TT>sgt_addgraphs</TT>) for adding two adjacency matrices (used for co-training). </P><P>You can get a description of the parameters of each module by calling it with the option -?. </FONT> <FONT color=#000000> The following examples illustrate how to use the different programs. </P> </FONT><H3>Example: Text Classification</H3><FONT color=#000000> <P>You will find an example text classification problem at </P> <DIR></FONT> <P><a href="https://download.joachims.org/sgt_light/example1/example1.tar.gz" target="_top">https://download.joachims.org/sgt_light/examples/example1.tar.gz</a></P></DIR><FONT color=#000000> <P>Download this file into your sgt_light directory and unpack it with </P> <DIR><TT> <P>gunzip -c example1.tar.gz | tar xvf -</P></DIR></TT> <P>This will create a subdirectory <TT>example1</TT>. Documents are represented as feature vectors. Each feature corresponds to a word stem (9947 features). The task is to learn which </FONT><A href="http://www.research.att.com/~lewis/reuters21578.html" target=_top>Reuters articles</A><FONT color=#000000> are about "corporate acquisitions". There are 300 positive and 300 negative examples in the file <tt>examples.dat</tt>. The feature numbers correspond to the line numbers in the file <TT>words</TT>. To run the example, execute the commands:</FONT> </P> <FONT color=#000000> <TT> <blockquote> <P>sgt_buildknngraph -k 50 example1/examples.dat example1/graph.gra<br> sgt_decompgraph -d 80 example1/graph.gra example1/graph.dcg</P> </blockquote> </TT></FONT> <P><FONT color=#000000>The first command takes the input points in <TT>example1/examples.dat</TT> and computes the weighted 50 nearest neighbor graph (using cosine similarity) which is stored in <TT>example1/graph.gra</TT>. See (<TT>svm_buildknngraph -?</TT>) for additional options. The second command computes the 80 first eigenvectors of the (normalized) Laplacian of <TT>example1/graph.gra</TT> and stores them in <TT>example1/graph.dcg</TT>. Both <TT>sgt_buildknngraph</TT> and <TT>sgt_decompgraph</TT> need to be called only once for each dataset, since their result is independent of the particular test/training split and the labels in the training set.</FONT> </P> <P><font color="#000000">To generate a random test/training split with 10 examples in the training set, call the command</font> </P> <FONT color=#000000> <DIR><P><TT>sgt_traintestsplit -n 10 example1/examples.dat example1/train.dat example1/test.dat</P></DIR></TT> <P>The files <TT>example1/train.dat</TT> and <TT>example1/test.dat</TT> contain the train and test labels in the same order as in the file <TT> example1/examples.dat</TT>. +1 stands for a positive label, -1 stands for a negative label, and 0 for unlabeled. To compute the predictions for this training/test split, call </P> <blockquote> <P><TT>sgt_predict -c 1000 -l example1/test.dat example1/train.dat example1/graph.dcg example1/pred</TT> </P> </blockquote> <P>Since the true test labels are given (<TT>-l example1/test.dat</TT>), the prediction accuracy on the test set is computed and printed to stdout. The trade-off between training error and complexity is set via <TT>-c 1000</TT> similar to an SVM. The individual predictions on the training and test examples are output to <TT>example1/pred</TT>, in the same order as in the file <TT>example1/train.dat</TT>. A positive value indicates prediction as a positive example, a negative value indicates a negative prediction. </P></FONT><H3>Example: Co-Training</H3><FONT color=#000000> <P>Sorry, not yet ready. </P> <H2>Questions and Bug Reports</H2><FONT color=#000000> <P>If you find bugs or you have problems with the code you cannot solve by yourself, please contact me via email <</FONT><a href="mailto:tj@cs.cornell.edu">tj@cs.cornell.edu</a>><FONT color=#000000>. </P></FONT> <H2>Disclaimer</H2><FONT color=#000000> <P>This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software. </P></FONT> <H2>History</H2> <H4>V1.00 </H4> <UL> <LI>Initial version </LI></UL> <H2>References</H2> <div align="justify"> <TABLE cellSpacing=0 cellPadding=5 border=0 width="100%"> <TBODY> <TR> <TD vAlign=top nowrap> <P><FONT color=#000000>[Joachims, 2003a]</FONT></P></TD> <TD vAlign=top><FONT color=#000000> <P>Thorsten Joachims, <i>Transductive Learning via Spectral Graph Partitioning</i>, Proceedings of the International Conference on Machine Learning (ICML), 2003.<BR>[<a href="http://www.joachims.org/publications/joachims_03a.pdf">PDF</a>] [<a href="http://www.joachims.org/publications/joachims_03a.ps.gz">Postscript (gz)</a>] </FONT></P></TD></TR> </TBODY></TABLE> </div> <P>Last modified August 22nd, 2003 by <A href="http://www.joachims.org/" target=_top>Thorsten Joachims</A><FONT color=#000000> <</FONT><A href="mailto:thorsten@joachims.org">thorsten@joachims.org</A><FONT color=#000000>></P></FONT></FONT></BODY></HTML>