How to remove duplicate lines on HUGE text file in batch

1 year ago

Comment

1 / 1

To remove duplicate lines from a large text file using a batch script, you can utilize the sort and uniq commands available in Unix-like environments like Linux or macOS. However, the Windows Command Prompt does not provide an equivalent native command for removing duplicate lines.

One option is to use the sort and uniq commands through the Windows Subsystem for Linux (WSL) on Windows 10 or newer versions. Here's how you can achieve this:

Install the Windows Subsystem for Linux (WSL):
- Open PowerShell as an administrator.
- Run the following command to enable WSL:
```
css
wsl --install
```
- Follow the on-screen instructions to complete the installation.
Launch the WSL terminal:
- Open the Start menu and search for "Ubuntu" or the name of the Linux distribution you installed.
- Launch the WSL terminal.
Navigate to the directory containing your text file:
- Use the cd command to navigate to the directory where your large text file is located.
Run the following command to remove duplicate lines from the text file:
```
lua
sort -u input.txt > output.txt
```
Replace input.txt with the name of your input file, and output.txt with the desired name for the output file. This command will sort the lines in the file and remove duplicates, writing the result to the output file.
Note that this command may take some time to complete depending on the size of your file.

After the command finishes executing, you will have a new file (output.txt) without duplicate lines.

Alternatively, you can use external tools or scripting languages, such as Python or PowerShell, to remove duplicate lines from large text files on Windows. These tools provide more flexibility and can handle large files efficiently.