CINXE.COM
Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model
<!DOCTYPE html> <html> <head> <!--Import Google Icon Font--> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.0.13/css/all.css" integrity="sha384-DNOHZ68U8hZfKXOrtjWvjxusGo9WQnrNx2sqG0tfsghAvtVlRW3tvkXWZh58N9jp" crossorigin="anonymous"> <link href="https://fonts.googleapis.com/css?family=Roboto" rel="stylesheet"> <!--Import materialize.css--> <link type="text/css" rel="stylesheet" href="css/materialize.min.css" media="screen,projection" /> <link type="text/css" rel="stylesheet" href="css/main.css" /> <!-- <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous"> --> <title>Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model</title> <!--Let browser know website is optimized for mobile--> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."> <meta name="keywords" content="LLM, Legal Document Drafting, Fine-tuning Large Language Models, Text Generation , Computer Science, Technology, open access proceedings"/> <!-- end common meta tags --> <!-- Dublin Core(DC) meta tags --> <meta name="dc.title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="citation_authors" content="Chun-Hsien Lin"> <meta name="citation_authors" content="Jyotin Goel"> <meta name="dc.type" content="Article"> <meta name="dc.source" content="Computer Science & Information Technology (CS & IT) Vol.14, No.08"> <meta name="dc.date" content="2024/04/28"> <meta name="dc.identifier" content="10.5121/csit.2024.140819"> <meta name="dc.publisher" content="AIRCC Publishing Corporation"> <meta name="dc.rights" content="http://creativecommons.org/licenses/by/3.0/"> <meta name="dc.format" content="application/pdf"> <meta name="dc.language" content="en"> <meta name="dc.description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/> <meta name="dc.subject" content="LLM"> <meta name="dc.subject" content="Legal Document Drafting"> <meta name="dc.subject" content="Fine-tuning Large Language Models"> <meta name="dc.subject" content="Text Generation"> <!-- End Dublin Core(DC) meta tags --> <!-- Prism meta tags --> <meta name="prism.publicationName" content="Computer Science & Information Technology (CS & IT)"> <meta name="prism.publicationDate" content="2024/04/28"> <meta name="prism.volume" content="14"> <meta name="prism.number" content="08"> <meta name="prism.section" content="Article"> <meta name="prism.startingPage" content="201"> <!-- End Prism meta tags --> <!-- citation meta tags --> <meta name="citation_journal_title" content="Computer Science & Information Technology (CS & IT)"> <meta name="citation_publisher" content="AIRCC Publishing Corporation"> <meta name="citation_authors" content="Chun-Hsien Lin and Pu-Jen Cheng"> <meta name="citation_title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta name="citation_online_date" content="2024/04/28"> <meta name="citation_issue" content="14"> <meta name="citation_firstpage" content="201"> <meta name="citation_authors" content="Chun-Hsien Lin"> <meta name="citation_authors" content="Pu-Jen Cheng"> <meta name="citation_doi" content="10.5121/csit.2024.140819"> <meta name="citation_abstract_html_url" content="https://aircconline.com/csit/abstract/v14n8/csit140819.html"> <meta name="citation_pdf_url" content="https://aircconline.com/csit/papers/vol14/csit140819.pdf"> <!-- end citation meta tags --> <!-- Og meta tags --> <meta property="og:site_name" content="AIRCC" /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://aircconline.com/csit/abstract/v14n8/csit140819.html"> <meta property="og:title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model"> <meta property="og:description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/> <!-- end og meta tags --> <!-- Start of twitter tags --> <meta name="twitter:card" content="Proceedings" /> <meta name="twitter:site" content="AIRCC" /> <meta name="twitter:title" content="Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model" /> <meta name="twitter:description" content="With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large numberof legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues."/> <meta name="twitter:image" content="https://airccse.org/img/aircc-logo1.jpg" /> <!-- End of twitter tags --> <style type="text/css"> .rdd { text-align: center; background: #f2f2f2; color: #000; font-weight: 700; width: 130px; height: 110px; border-radius: 100%; box-shadow: inset 1px 0px 22px 3px #4080ca; font-family: 'Oswald', sans-serif; border: 5px solid #1f8ea3; margin: 5% auto; line-height: 110px; } </style> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <script> (adsbygoogle = window.adsbygoogle || []).push({ google_ad_client: "ca-pub-1537319084895272", enable_page_level_ads: true }); </script> </head> <body> <!-- Responsive NavBar --> <div class="navbar-fixed"> <nav class="cyan lighten-2 z-depth-5"> <div class="container"> <div class="nav-wrapper"> <ul> <li id="b-logo"> <img id="brand-logo" href="index.php" class="hide-on-med-and-down" src="img/aircc-logo1.jpg"> </li> </ul> <a class="brand-logo" href="index.php">AIRCC</a> <a data-activates="side-nav" class="button-collapse show-on-small left"> <i class="material-icons">menu</i> </a> <ul class="right hide-on-med-and-down"> <li > <a href="https://aircconline.com/">Home</a> </li> <li> <a href="https://airccse.org/csit/V14N08.html">Current Issue</a> </li> <li> <a href="https://airccse.org/arch.html">Archives</a> </li> <li> <a href="https://airccse.org/csit/acontact.html">Contact</a> </li> <li> <a class="openIcon" onclick="openSearch()"> <i class="material-icons">search</i> </a> </li> </ul> </div> </div> </nav> </div> <!-- SIDE NAVBAR --> <ul class="side-nav" id="side-nav"> <li> <div class="user-view arc"> <div class="background"> <img class="mobile-overlay" > </div> <a href="https://aircconline.com/"> <i id="cl" class="material-icons cyan-text text-lighten-2 right">close</i> </a> <a href="https://aircconline.com/"> <img class="circle" src="img/aircc-logo1.jpg"> </a> <h4 class="grey-text">AIRCC</h4> </div> </li> <li > <a href="https://aircconline.com/">Home <i class="material-icons">home</i> </a> </li> <li> <a href="https://airccse.org/csit/V14N08.html">Current Issue <i class="fas fa-users"></i> </a> </li> <li> <a href="https://airccse.org/arch.html">Archives <i class="fas fa-users"></i> </a> </li> <li> <a href="https://airccse.org/csit/acontact.html">Contact <i class="fas fa-calendar-alt"></i> </a> </li> <li> <!-- Search Bar --> <li> <a class="openIcon" id="icon" onclick="openSearch()"> <i class="material-icons">search</i> Search </a> </li> </ul> <!-- Search Icon Overlay Content --> <div id="myOverlay" class="overlay"> <span class="closeIcon" onclick="closeSearch()" title="Close Overlay">×</span> <div class="overlay-content"> <form action="https://airccj.org/csecfp/library/index.php"> <input type="text" placeholder="Search.." name="title"> <button type="submit"> <i class="material-icons center">search</i> </button> </form> </div> </div> <!-- Main Section - Left --> <section class="section-main"> <div class="container"> <div class="row"> <div class="col s12 m8"> <div class="card z-depth-2"> <div class="card-content"> <h5 class="cyan-text center text-darken-1"> Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model </h5> </div> </div> <br> <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Authors</h5> <!-- <div class="divider"></div> --> <div class="card-content"> <p class="left-text" style="text-align:justify"> Chun-Hsien Lin and Pu-Jen Cheng, National Taiwan University, Taiwan </p> </div> </div> <!-- end 2020 --> <!-- Start of London United Kingdom--> <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Abstract</h5> <!-- <div class="divider"></div> --> <div class="card-content"> <p class="left-text" style="text-align:justify"> With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues. </p> </div> </div> <div class="card"> <h5 id="about" class="brown-text text-darken-2 text-center" style="padding-bottom:0px">Keywords</h5> <!-- <div class="divider"></div> --> <div class="card-content"> LLM, Legal Document Drafting, Fine-tuning Large Language Models, Text Generation </p> </div> </div> <div class="card-content"> <a href="https://aircconline.com/csit/papers/vol14/csit140819.pdf" target="_blank" class="btn btn-small lighten-2 cyan lig">Full Text</a> <a href="https://airccse.org/csit/V14N08.html" target="_blank" class="btn btn-small lighten-2 cyan lig">Volume 14 Number 08</a> </div> </div> <!-- Right Side Bar --> <div id="side-bar" class="col s12 m4"> <div id="section-main"> <div class="card side cyan lighten-2"> <div class="card-content"> <ul> <li class="ax waves-effect waves-light"> <a class="white-text" href="https://airccse.org/editorial.html" target="blank"><i class="material-icons left">account_circle</i>Editorial Board</a> <br> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" href="https://airccse.org/arch.html" target="blank"> <i class="material-icons fa fa-archive left"></i>Archives </a> <br> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" target="blank" href="https://airccse.org/indexing.html"> <i class="material-icons left">local_pharmacy</i>Indexing</a> </li> <br> <br> <div class="divider"></div> <br> <li class="ax waves-effect waves-light"> <a class="white-text" target="blank" href="http://airccse.org/faq.html"> <i class="material-icons left">quiz</i>FAQ</a> </li> </ul> </div> </div> </div> </div> </div> </div> </section> <!-- Dummy Div--> <div id="txtcnt"></div> <!-- Section: Footer --> <footer class="page-footer cyan lighten-3"> <div class="container"> <div class="row"> <div class="footer-m col s12 m6 l3 "> <ul> <li> <img src="img/since2008.png" alt="since2008"> </li> </ul> </div> <div class="footer-m col s12 m6 l3 "> <ul> <li> <a class="white-text" href="ethics.html">Ethics</a> </li> <li> <a class="white-text" href="faq.html">FAQ</a> </li> <li> <a class="white-text" href="subscription.html">Subscription</a> </li> </ul> </div> <div class="footer-m col s12 m6 l3 offset-m1"> <ul> <li> <a class="white-text" href="acontact.html">Contact</a> </li> <li> <a class="white-text" href="https://airccse.org/sitemap.html">Sitemap</a> </li> </ul> </div> <div class="social col s12 m6 l3 offset-m1"> <ul> <li> <a class="blue-text text-darken-4" href="https://www.facebook.com/AIRCCPC" target="blank"> <i class="fab fa-facebook"> </i> </a> </li> <li> <a class="cyan-text " href="https://twitter.com/AIRCCFP" target="blank"> <i class="fab fa-twitter"></i> </a> </li> <li> <a class="red-text text-darken-4" href="https://www.youtube.com/channel/UCzkuYvuKuNCIc3jbE52IeZg" target="blank"> <i class="fab fa-youtube"></i> </a> </li> </ul> </div> </div> </div> <div class="footer-copyright grey darken-2"> <div class="container center-align"> <large class="white-text">Not for Profit @ All Rights Reserved ® AIRCC </large> </div> </div> <!-- Credit to The Delivery Team --> <div class="col s12 m10 offset-m1"> <div class="grey darken-3 center-align"> <small class="white-text">Designed and Developed by Wireilla Delivery Team</small> </div> </div> </footer> <script type="text/javascript" src="https://code.jquery.com/jquery-3.2.1.min.js"></script> <script type="text/javascript" src="js/materialize.min.js"></script> <script src="js/scrolltop.js"></script> <script src="js/search.js"></script> <script src="js/popup.js"></script> <script src="js/main.jquery.js"></script> </body> <!--Import jQuery before materialize.js--> </html>