> ## Documentation Index > Fetch the complete documentation index at: https://docs.anarchy.ai/llms.txt > Use this file to discover all available pages before exploring further. # Augmenting Your Dataset > How to make your dataset bigger and more useful to train LLMs ## What is Data Augmentation? Data augmentation is the process of increasing the diversity and size of a dataset through various techniques such as generating synthetic data or elaborative augmentation, which enhances the performance and robustness of machine learning models. Data augmentation is used to refine and bolster the quality of datasets that can be used to fine-tune LLMs. There are two primary forms of data augmentation: Elaborative, and Synthetic. This page is a guide for each type and how to use it. ## Benefits of Augmentation Synthetic augmentation increases the volume of the dataset, which can be particularly useful in scenarios where acquiring real-world data is challenging or expensive. A larger and more diverse dataset can lead to better model performance, as it provides a broader range of examples for training. By generating diverse data points, synthetic augmentation can help in reducing biases present in the original dataset. # Synthetic Synthetic augmentation, also known as **data synthesis**, is a method used to expand an existing dataset by utilizing LLMs. This process involves sampling existing data points and generating new, synthetic data points based on them. The newly created data is referred to as synthesized or synthetic data. **Process** The initial step involves selecting data points from the existing dataset. These samples serve as the foundation for generating new data. Using an LLM, new data points are generated. The LLM analyzes the sampled data and creates new data that is consistent with the patterns and structures observed in the original dataset. The synthesized data is then integrated back into the original dataset, effectively expanding it and enhancing its diversity. **Applications** For generating patient data to train medical models without compromising patient privacy. To simulate transaction data for fraud detection and other financial models. # Elaborative Elaborative augmentation is a technique used to create a new dataset from uploaded documents. This process involves utilizing a large language model (LLM) to extract data from your documents and generate a series of prompt-completion pairs. These pairs are then stored as data points in a new dataset. Once enough data points have been generated, this new dataset can be used to fine-tune a model. **Process** Begin by uploading the documents from which you want to create the new dataset. These documents will serve as the source material for data extraction. The LLM analyzes the uploaded documents and extracts relevant data. This data forms the basis for generating prompt-completion pairs. The LLM creates a set of prompt-completion pairs based on the extracted data. Each pair consists of a prompt (input) and a corresponding completion (output), which are stored as individual data points. The generated prompt-completion pairs are aggregated to form a new dataset. This dataset can be expanded as more documents are processed and more pairs are generated. Once a sufficient number of data points have been accumulated, the new dataset can be used to fine-tune a model. This fine-tuning process enhances the model's ability to understand and generate text related to the specific domain of the uploaded documents. **Applications** Generating datasets from legal documents to train models for legal text analysis. Creating datasets from customer service logs to improve chatbots and automated response systems. Utilizing research papers and academic articles to develop models for literature review and summarization.