Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

<!DOCTYPE html> <html> <head>  <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.0.13/css/all.css" integrity="sha384-DNOHZ68U8hZfKXOrtjWvjxusGo9WQnrNx2sqG0tfsghAvtVlRW3tvkXWZh58N9jp" crossorigin="anonymous"> <link href="https://fonts.googleapis.com/css?family=Roboto" rel="stylesheet">  <link type="text/css" rel="stylesheet" href="css/materialize.min.css" media="screen,projection" /> <link type="text/css" rel="stylesheet" href="css/main.css" />  <title>Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model</title>  <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."> <meta name="keywords" content="LLM, Legal Document Drafting, Fine-tuning Large Language Models, Text Generation , Computer Science, Technology, open access proceedings"/>   <meta name="dc.title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="citation_authors" content="Chun-Hsien Lin"> <meta name="citation_authors" content="Jyotin Goel"> <meta name="dc.type" content="Article"> <meta name="dc.source" content="Computer Science & Information Technology (CS & IT) Vol.14, No.08"> <meta name="dc.date" content="2024/04/28"> <meta name="dc.identifier" content="10.5121/csit.2024.140819"> <meta name="dc.publisher" content="AIRCC Publishing Corporation"> <meta name="dc.rights" content="http://creativecommons.org/licenses/by/3.0/"> <meta name="dc.format" content="application/pdf"> <meta name="dc.language" content="en"> <meta name="dc.description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/> <meta name="dc.subject" content="LLM"> <meta name="dc.subject" content="Legal Document Drafting"> <meta name="dc.subject" content="Fine-tuning Large Language Models"> <meta name="dc.subject" content="Text Generation">   <meta name="prism.publicationName" content="Computer Science & Information Technology (CS & IT)"> <meta name="prism.publicationDate" content="2024/04/28"> <meta name="prism.volume" content="14"> <meta name="prism.number" content="08"> <meta name="prism.section" content="Article"> <meta name="prism.startingPage" content="201">   <meta name="citation_journal_title" content="Computer Science & Information Technology (CS & IT)"> <meta name="citation_publisher" content="AIRCC Publishing Corporation"> <meta name="citation_authors" content="Chun-Hsien Lin and Pu-Jen Cheng"> <meta name="citation_title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="citation_online_date" content="2024/04/28"> <meta name="citation_issue" content="14"> <meta name="citation_firstpage" content="201"> <meta name="citation_authors" content="Chun-Hsien Lin"> <meta name="citation_authors" content="Pu-Jen Cheng"> <meta name="citation_doi" content="10.5121/csit.2024.140819"> <meta name="citation_abstract_html_url" content="https://aircconline.com/csit/abstract/v14n8/csit140819.html"> <meta name="citation_pdf_url" content="https://aircconline.com/csit/papers/vol14/csit140819.pdf">   <meta property="og:site_name" content="AIRCC" /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://aircconline.com/csit/abstract/v14n8/csit140819.html"> <meta property="og:title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta property="og:description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/>   <meta name="twitter:card" content="Proceedings" /> <meta name="twitter:site" content="AIRCC" /> <meta name="twitter:title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model" /> <meta name="twitter:description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/> <meta name="twitter:image" content="https://airccse.org/img/aircc-logo1.jpg" />  <style type="text/css"> .rdd { text-align: center; background: #f2f2f2; color: #000; font-weight: 700; width: 130px; height: 110px; border-radius: 100%; box-shadow: inset 1px 0px 22px 3px #4080ca; font-family: 'Oswald', sans-serif; border: 5px solid #1f8ea3; margin: 5% auto; line-height: 110px; } </style> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <script> (adsbygoogle = window.adsbygoogle || []).push({ google_ad_client: "ca-pub-1537319084895272", enable_page_level_ads: true }); </script> </head> <body>  <div class="navbar-fixed"> <nav class="cyan lighten-2 z-depth-5"> <div class="container"> <div class="nav-wrapper"> <ul> <li id="b-logo"> <img id="brand-logo" href="index.php" class="hide-on-med-and-down" src="img/aircc-logo1.jpg"> </li> </ul> <a class="brand-logo" href="index.php">AIRCC</a> <a data-activates="side-nav" class="button-collapse show-on-small left"> <i class="material-icons">menu</i> </a> <ul class="right hide-on-med-and-down"> <li > <a href="https://aircconline.com/">Home</a> </li> <li> <a href="https://airccse.org/csit/V14N08.html">Current Issue</a> </li> <li> <a href="https://airccse.org/arch.html">Archives</a> </li> <li> <a href="https://airccse.org/csit/acontact.html">Contact</a> </li> <li> <a class="openIcon" onclick="openSearch()"> <i class="material-icons">search</i> </a> </li> </ul> </div> </div> </nav> </div>  <ul class="side-nav" id="side-nav"> <li> <div class="user-view arc"> <div class="background"> <img class="mobile-overlay" > </div> <a href="https://aircconline.com/"> <i id="cl" class="material-icons cyan-text text-lighten-2 right">close</i> </a> <a href="https://aircconline.com/"> <img class="circle" src="img/aircc-logo1.jpg"> </a> <h4 class="grey-text">AIRCC</h4> </div> </li> <li > <a href="https://aircconline.com/">Home <i class="material-icons">home</i> </a> </li> <li> <a href="https://airccse.org/csit/V14N08.html">Current Issue <i class="fas fa-users"></i> </a> </li> <li> <a href="https://airccse.org/arch.html">Archives <i class="fas fa-users"></i> </a> </li> <li> <a href="https://airccse.org/csit/acontact.html">Contact <i class="fas fa-calendar-alt"></i> </a> </li> <li>  <li> <a class="openIcon" id="icon" onclick="openSearch()"> <i class="material-icons">search</i> Search </a> </li> </ul>  <div id="myOverlay" class="overlay"> <span class="closeIcon" onclick="closeSearch()" title="Close Overlay">×</span> <div class="overlay-content"> <form action="https://airccj.org/csecfp/library/index.php"> <input type="text" placeholder="Search.." name="title"> <button type="submit"> <i class="material-icons center">search</i> </button> </form> </div> </div>  <section class="section-main"> <div class="container"> <div class="row"> <div class="col s12 m8"> <div class="card z-depth-2"> <div class="card-content"> <h5 class="cyan-text center text-darken-1"> Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model </h5> </div> </div> <br> <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Authors</h5>  <div class="card-content"> <p class="left-text" style="text-align:justify"> Chun-Hsien Lin and Pu-Jen Cheng, National Taiwan University, Taiwan </p> </div> </div>   <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Abstract</h5>  <div class="card-content"> <p class="left-text" style="text-align:justify"> With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues. </p> </div> </div> <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Keywords</h5>  <div class="card-content"> LLM, Legal Document Drafting, Fine-tuning Large Language Models, Text Generation </p> </div> </div> <div class="card-content"> <a href="https://aircconline.com/csit/papers/vol14/csit140819.pdf" target="_blank" class="btn btn-small lighten-2 cyan lig">Full Text</a>  <a href="https://airccse.org/csit/V14N08.html" target="_blank" class="btn btn-small lighten-2 cyan lig">Volume 14 Number 08</a> </div> </div>  <div id="side-bar" class="col s12 m4"> <div id="section-main"> <div class="card side cyan lighten-2"> <div class="card-content"> <ul> <li class="ax waves-effect waves-light"> <a class="white-text" href="https://airccse.org/editorial.html" target="blank"><i class="material-icons left">account_circle</i>Editorial Board</a> <br> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" href="https://airccse.org/arch.html" target="blank"> <i class="material-icons fa fa-archive left"></i>Archives </a> <br> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" target="blank" href="https://airccse.org/indexing.html"> <i class="material-icons left">local_pharmacy</i>Indexing</a> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" target="blank" href="http://airccse.org/faq.html"> <i class="material-icons left">quiz</i>FAQ</a> </li> </ul> </div> </div> </div> </div> </div> </div> </section>  <div id="txtcnt"></div>  <footer class="page-footer cyan lighten-3"> <div class="container"> <div class="row"> <div class="footer-m col s12 m6 l3 "> <ul> <li> <img src="img/since2008.png" alt="since2008"> </li> </ul> </div> <div class="footer-m col s12 m6 l3 "> <ul> <li> <a class="white-text" href="ethics.html">Ethics</a> </li> <li> <a class="white-text" href="faq.html">FAQ</a> </li> <li> <a class="white-text" href="subscription.html">Subscription</a> </li> </ul> </div> <div class="footer-m col s12 m6 l3 offset-m1"> <ul> <li> <a class="white-text" href="acontact.html">Contact</a> </li> <li> <a class="white-text" href="https://airccse.org/sitemap.html">Sitemap</a> </li> </ul> </div> <div class="social col s12 m6 l3 offset-m1"> <ul> <li> <a class="blue-text text-darken-4" href="https://www.facebook.com/AIRCCPC" target="blank"> <i class="fab fa-facebook"> </i> </a> </li> <li> <a class="cyan-text " href="https://twitter.com/AIRCCFP" target="blank"> <i class="fab fa-twitter"></i> </a> </li> <li> <a class="red-text text-darken-4" href="https://www.youtube.com/channel/UCzkuYvuKuNCIc3jbE52IeZg" target="blank"> <i class="fab fa-youtube"></i> </a> </li> </ul> </div> </div> </div> <div class="footer-copyright grey darken-2"> <div class="container center-align"> <large class="white-text">Not for Profit @ All Rights Reserved ® AIRCC </large> </div> </div>  <div class="col s12 m10 offset-m1"> <div class="grey darken-3 center-align"> <small class="white-text">Designed and Developed by Wireilla Delivery Team</small> </div> </div> </footer> <script type="text/javascript" src="https://code.jquery.com/jquery-3.2.1.min.js"></script> <script type="text/javascript" src="js/materialize.min.js"></script> <script src="js/scrolltop.js"></script> <script src="js/search.js"></script> <script src="js/popup.js"></script> <script src="js/main.jquery.js"></script> </body>  </html>

CINXE.COM

Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model