Custom Programmes for editing large files

People sometimes need to edit a large file. I do it time and again. While Word Processors and Spread sheets can do it for you in seconds, sometime these software just can’t do it. For instance how would you pick all the words at the starting of sentences and place them in a new file, each word ordered and in a new line. (This is illustrated below)

first

The situation may be complex at times, or else plain simple. Doing it manually is not possible. Or maybe possible but not possible ;). Now non programmers can search for some software. The programmers can write a program and get the job done in a small time. After all programming is done with the purpose of easing up the work.
Without further ado let’s take a look at 3 programmes. You can use any language; I will do it in C++ or PHP. Remember language can differ but logic remains same.

TASK 1: Write 1000’s of random numbers to a file. Also the number should be less than 10,000. I am sure your spread sheet can’t do this.

Logic :

  1. Open a file named random.txt
  2. Inside a loop, generate random number (modulo maximum number)
  3. Convert number to character array (or string representation)
  4. Write the above string to file.
  5. Write a newline (or any other separator like comma or colon) character to file
  6. End loop
  7. Close file

Program :

#include <fstream>
#include <cstdio>
#include <cstring>
#include <cstdlib>

#define LENGTH 1000
#define SIZE 10000

using namespace std;

int main()
{
    //Opens a file in output and text mode named "random.txt"
    ofstream fout("random.txt");
    char string[6];

    for(int i = 0; i<LENGTH; i++)
    {
        //generate a random number Modulo size
        int num = rand()%SIZE;

        //convert integer to string
        sprintf(string, "%d", num);

        //write the string to file
        fout.write(string, strlen(string));

        //place a newline after each number
        fout.write("\n", 1);
    }

    //close the file
    fout.close();

    return 0;
}

Output File :

1random

 

TASK 2 : While TASK 1 was easy let’s do something more difficult. Convert a CSV (Comma Separated Values) file to XML (Xtensible Markup Language) file

Logic :

  1. Open the CSV file (CSV file is a regular text file, but information is separated by comma and newline)
  2. Open a blank file (to be saved as XML file)
  3. Write XML tag indicating XML version, such as this “<? xml version=”1.0″ ?>”
  4. Read all the lines one by one until the end of file.
  5. Break line into a string array with the separator being comma
  6. Do possible processing. This will become clear in the example.
  7. Write Values obtained by breaking line into XML file taking special care of opening and closing tag.
  8. Close the file

Example :

As shown below I have taken a CSV file, in which geometric shapes along with its colour and side/length/radius properties are stored.

csvtoxml

Program :

<?php

//read each line in the file in the array $lines
$csv_in = file('geometry.csv');

//open a file in write mode
$xml_out = fopen('geometry.xml', 'wb');

//write xml version to file
fwrite($xml_out, '<?xml version="1.0"?>'."\r\n");

//xml file is to be enclosed by root tag, in this case <graph> & </graph>
fwrite($xml_out, "<graph>\r\n");//loop through each line in the csv file

foreach($csv_in as $line)
{
    //break the csv line into string values
    $values = explode(',', trim($line));

    //every entry in xml file is to be enclosed in <geometry> & </geometry>
    fwrite($xml_out, "<geometry>\r\n");

    //We have taken 2nd value in csv line to be colour
    fwrite($xml_out, "\t<shape color='".$values[1]."'>");

    //We have taken 1st value in csv line to be the shape type
    fwrite($xml_out, $values[0]."</shape>\r\n");

    switch($values[0])
    {
    case 'Square':
        //if shape is square 3rd value in csv line is side length
        fwrite($xml_out, "\t<side>".$values[2]."</side>\r\n");
        break;
    case 'Circle':
        //if shape is circle 3rd value in csv line is radius
        fwrite($xml_out, "\t<radius>".$values[2]."</radius>\r\n");
        break;
    case 'Rectangle':
        //if shape is rectangle 3rd value in csv line is length & 4th is breadth
        fwrite($xml_out, "\t<length>".$values[2]."</length>\r\n");
        fwrite($xml_out, "\t<breadth>".$values[3]."</breadth>\r\n");
        break;
    case 'Triangle':
        //if shape is triangle 3rd, 4th & 5th value in csv line are sides
        fwrite($xml_out, "\t<side>".$values[2]."</side>\r\n");
        fwrite($xml_out, "\t<side>".$values[3]."</side>\r\n");
        fwrite($xml_out, "\t<side>".$values[4]."</side>\r\n");
        break;
    }
    fwrite($xml_out, "</geometry>\r\n");
}

fwrite($xml_out, "</graph>\r\n");
fclose($xml_out);

?>

Note : In the above program I have not validated the input file (like an erroneous input – incorrect number of properties for a shape should stop the program). And I have left error checking as well (like the file being missing). But you should definitely do this. Also for generating XML file, we have complete discretion in creating the file and later reading it. Thus the XML file can be created in several different ways.

 

TASK 3 : Delete words with less than 4 characters and greater than 8 characters in a text file, with each word stored on separate line

list

Logic :

  1. Open required file.
  2. Open an empty file with any temporary name.
  3. Read each word in the file
  4. Check if the length of the word is “greater than equal to 4” & “is less than equal to 8”
  5. If yes write the word to temp file
  6. Do this for all the words
  7. Close both files
  8. Delete input file
  9. Rename temp file with the input file’s name

Program :

#include <fstream>
#include <string>
#include <cstdio>

using namespace std;

int main()
{
    //open input file
    ifstream fin("list.txt");

    //open output file with some temporary name (to be renamed list.txt later)
    ofstream fout("temp.txt");

    string line;
    char buf[20];

    while(getline(fin, line))
    {
        if ( line.length() >= 4 && line.length() <= 8 )
        {
            line.copy(buf, line.length());
            fout.write(buf, line.length());
            fout.write("\n", 1);
        }
    }

    //close both the files
    fin.close();
    fout.close();

    //delete "list.txt"
    remove("list.txt");

    rename("temp.txt", "list.txt");

    return 0;
}

TASK 4 : A very practical task of editing a SRT file. Consider you want to watch a movie, and you download subtitles from internet. But the dialogues are appearing 54 seconds later. Either you can keep searching for a appropriate SRT file, or you can write a program in 20 minutes and use it forever. Now your task is to shift time in a SRT file either ahead or behind, by specified number of seconds.

Logic :

  1. Ask for the name of the SRT file and open the file.
  2. Open an empty file with any temporary name.
  3. Ask for the number of seconds to move the subtitles.
  4. Read each line of input file.
  5. Write the line to output file
  6. Check if the line is equal to dialogue number. (Maintain a integer variable number starting from 1, and compare each line with number)
  7. If true, read next line (next line to dialogue number is always a Time line of the format ‘00:02:32,634 –> 00:02:33,865’)
  8. In this line extract hour, min, sec (e.g. 00, 02, 32)
  9. Convert hour, min, sec to seconds.
  10. Add / subtract seconds with seconds asked on step 3
  11. convert these seconds to hour, min, sec and write them in the line at appropriate place.
  12. Repeat this for ending time (00:02:33)
  13. Write the modified line to the output and Increment the ‘number’.
  14. Continue with the outer loop
  15. At the end, rename temp file to original filename.

srt

Program :

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cstring>

using namespace std;

int main()
{
    cout<<"Subtitles Time Shift Utility\n\n";
    string line;
    char filename[50];
    char buf[100];

    cout<<"Enter the name of SRT file to be time shifted (Eg. 'Great Movie.srt') :\n";
    getline(cin, line);

    memset(filename, '\0', 50);
    line.copy(filename, line.length());

    ifstream fin(filename);
    if(!fin){
        cout<<"Error opening file "<<filename<<"\nPlease check that file is in the same folder and try again...\n\n";
        system("pause");
        return 0;
    }

    ofstream fout("srttemp");

    int number=1;
    int sec=0;
    int hour=0;
    int min=0;
    int move;

    cout<<"Enter number of seconds to move the subtitles of "<<filename<<" (use minus '-' for subtracting, 0 to quit) :\n";
    cin>>move;

    if(move == 0){
        fin.close();
        fout.close();
        remove("srttemp");
        return 0;
    }

    while(getline(fin, line))
    {
        memset(buf, '\0', 100);
        line.copy(buf, line.length());
        fout.write(buf, line.length());
        fout.write("\n", 1);

        sprintf(buf, "%d", number);

        if(line.compare(buf)==0){
            getline(fin, line);

            memset(buf, '\0', 100);
            line.copy(buf, 2, 0);
            hour = atoi(buf);
            line.copy(buf, 2, 3);
            min  = atoi(buf);
            line.copy(buf, 2, 6);
            sec  = atoi(buf);
            sec += min*60 + hour*3600 + move;

            hour = sec/3600;
            sec %= 3600;
            min = sec/60;
            sec %= 60;

            sprintf(buf, "%02d:%02d:%02d", hour, min, sec);

            line.replace(0, 8, buf);

            line.copy(buf, 2, 17);
            hour = atoi(buf);
            line.copy(buf, 2, 20);
            min  = atoi(buf);
            line.copy(buf, 2, 23);
            sec  = atoi(buf);
            sec += min*60 + hour*3600 + move;

            hour = sec/3600;
            sec %= 3600;
            min = sec/60;
            sec %= 60;

            sprintf(buf, "%02d:%02d:%02d", hour, min, sec);

            line.replace(17, 8, buf);

            line.copy(buf, line.length());
            fout.write(buf, line.length());
            fout.write("\n", 1);

            number++;
        }
    }

    fout.close();
    fin.close();

    remove(filename);
    rename("srttemp", filename);

    cout<<"Done!\n";
    system("pause");
    return 0;
}

Last words:

When working with some important document (only one copy), don’t use “remove” and “rename” functions in the program. Save file with different name (like “temp.txt”) and review the changes. If every thing is fine, you can go ahead and replace the file.

We face a different task every day. By writing custom programmes for each situation we can deal with it in a small time. But if we add all these times, it may seem appropriate to write one big programme to solve different kinds of problems, or at least some of them. Since we spend considerable time in debugging, a multipurpose programme becomes more suitable.

Advertisements
This entry was posted in Programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s