Listserv to MBox format

From: Terry Howerton <terry_at_scouter.com_at_hypermail-project.org>
Date: Tue, 7 Dec 1999 22:12:35 -0600
Message-ID: <NDBBKDFICKDFKNDCEMPGAEOLEDAA.terry_at_SCOUTER.com>


Below is a message from several months ago about converting Listserv(tm) style mailboxes to UNIX Mbox format. I am trying to run the scrip on a large mailbox with about 1,500 messages, but to no avail.

The command line on Win32 that I am using is:

ls2mail.pl inbox.txt outbox.mbox

That cycles through all the messages, but does not write anything to the outbox file and does not modify the inbox file.

I have two questions:

  1. Do current versions of Hypermail work on the listserv style mail?
  2. Any idea what might be wrong with the script below, or is there a better way to convert this mail?

TERRY HOWERTON
>Below please find a Perl script I helped to write for archiving the
>ADV-HTML mailing list with Hypermail. I originally wrote the script for
>Patrick Douglas Crispen <crispen_at_netsquirrel.com>, but he said it would
>be okay to include the script with Hypermail (if so desired).
>
>This script appears to function similarly to the n2folder Perl script
>that Peter Murray <pem_at_po.cwru.edu> recently sent, and they may even
>operate on the same type of listserv archive.
>
>One thing unique feature that ls2mail may have is that it generates a
>"Message-ID" for each message when converting the listserv archive to an
>mbox archive. While this won't help when linking message threads, it
>did help Hypermail (possibly required by the old 1.x versions?) to
>create its output.
>
>Dave

#!/usr/local/bin/perl
#
# ls2mail -- converts listserv formated digests to UNIX "mail" format
#
# Usage:
#   ./ls2mail < infile > outfile
#   cat infile | ./ls2mail > outfile
#
# Written by David Kilzer <ddkilzer_at_ti.com>
# Tue, Mar 24, 1998
#

use strict;

my $first_time = 1;	# marks first time through script
my $line;		# stores one input line
my $header;		# stores mail message header lines
my $from_address;	# stores "From:" address
my _at_date;		# stores "Date:" information
my $message_id;		# stores new message ID info


while ($line = <>)	# use '<>' operator so we act like a UNIX filter
{
  chomp ($line);	# remove extra newlines

  if ($line !~ m/^={73}$/)	# Separator line?
  {
    print $line, "\n";		# Not separator, just print
  }
  else				# Found separator line, process
  {
    $header = "";	# clear variable
    $from_address = "";	# clear variable
    _at_date = ();		# clear variable
    $message_id = "";	# clear variable

    # Read in email header lines

    while ($line = <>)
    {

      last if ($line =~ m/^\s*$/);  # message header ends with "blank" line
      $header .= $line;		# add $line to $header
    }

    $header =~ s/^Sender:\s/To: /mi; # change "Sender:" to "To:"

    $header =~ s/^([^\s:]+:)\s+/$1 /mg; # remove extra space from all lines

    $header =~ s/\n\s+/\n /mg; # continued lines used 8 spaces

    # Find "From" address to use
    if ($header =~ m/^Reply-To:\s.*<([^>]+)>/mi)     {
      $from_address = $1;
    }
    elsif ($header =~ m/^Reply-To:\s.*\n\s.*<([^>]+)>/mi)     {
      $from_address = $1;
    }

    $header =~ m/^Date:\s(.*)$/mi;	# find "Date" header
    _at_date = split (' ', $1);		# split date into an array
    $date[0] =~ tr/,//d;		# remove commas from first date element
    $date[1] = " " . $date[1] if (length($date[1]) == 1);
    					# add space to single days

    $message_id = uc (join ('.', _at_date, $from_address)); # create message ID
    $message_id =~ tr/[A-Z][0-9]._at_//cd;	# remove bad characters

    # Print new UNIX mail header
    print "\n" if (! $first_time);
    $first_time &&= 0;

    print "From $from_address $date[0] $date[2] $date[1] $date[4]
$date[3]\n";
    print "Message-Id: <$message_id>\n";
    print $header, "\n";
  }
}

exit 0;

__END__

Received on Wed 08 Dec 1999 06:12:37 AM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT