Gossamer Forum
Skip to Content


Home : Plugins Support - Paid : DMOZ_Wizard :

error after SSH download

Quote Reply
error after SSH download
Hi Andy,

I've been trying to get the ultranerds dmoz script working for 2 weeks, it is very smooth looking, but I can't get past one point...

putty SSH log begins:
login as: x
x@web4.hostingcompany.com password: x
Last login: Wed Feb 28 20:43:24 2007 from x.x.x.x

-sh-2.05b$ cd /hsphere/local/home/example/mysite.com/cgi/admin
-sh-2.05b$ perl dmoz_cron.cgi > log.txt &
[1] 27520
-sh-2.05b$ --14:58:48-- http://rdf.dmoz.org/rdf/content.rdf.u8.gz
=> `content.rdf.u8.gz'
Resolving rdf.dmoz.org... 207.200.81.178
Connecting to rdf.dmoz.org|207.200.81.178|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 326,839,153 (312M) [application/x-gzip]


0% > 99% [===================================> ] 326,558,632 359.76K/s ETA 00:01
99% [===================================> ] 326,627,368 358.77K/s ETA 00:01
99% [===================================> ] 326,693,976 356.50K/s ETA 00:01
99% [===================================> ] 326,762,040 348.38K/s ETA 00:00
100%[====================================>] 326,839,153 350.68K/s ETA 00:00

15:18:58 (263.89 KB/s) - `content.rdf.u8.gz' saved [326839153/326839153]

mv: `content.rdf.u8.gz' and `/hsphere/local/home/example/mysite.com/cgi/admin/content.rdf.u8.gz' are the same file


Last edited by:

cronjob: Apr 9, 2007, 8:33 AM
Quote Reply
Re: [cronjob] error after SSH download In reply to
Hi,

Sorry for the delay in getting back to you. Could you please send over your dmoz_cron.cgi script (in your /cgi-bin/admin/ folder).

Also, could you give me a few more details on how you are running it? Via the "setup cronjob" option in DMOZ_Wizard, or via a custom cronjob (i.e some people have it so the script just wipes out their current data, and re-imports the DMOZ data from scratch).

TIA, and apologies for the delay (been without Internet for over 3 weeks).

Cheers
Andy
Programmer/Designer/LinksSQL Freak Cool

http://www.ultranerds.com
http://www.imagesql.com
Quote Reply
Re: [Andy] error after SSH download In reply to
agghh my hair is thinning I've tried it so many ways

Here is the story:

Visit http://mysite.com/cgi/admin/admin.cgi
plugins
DMOZ
Enter /Regional/Europe/Ireland/
Not ticking the cron job (I do things manually)
Click the form button, wait a second then site returns a html page

* Blanked out dmoz_cron.cgi
* Wrote sliceing up codes for appropriate categories:

Regional/Europe/Ireland


Script has been setup to run. All you need to do now, is log in via Telnet/SSH, and type;

cd /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin <return>

perl dmoz_cron.cgi > log.txt &

If you have imported any categories that have non-standard English characters, then you need to also run;
perl utfupdate.cgi > log2.txt &
This will replace any non-standard English characters to their respective true values (an old bug in the DMOZ RDF file)Load putty and login via SSH

cd /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin
perl dmoz_cron.cgi > log.txt &

downloading 1%....100%

but then "21:28:11 (340.65 KB/s) - `content.rdf.u8.gz' saved [326706303/326706303]

mv: `content.rdf.u8.gz' and `/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8.gz' are the same file"


then an email arrives

Import reported an error. The log file /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/logs/SliceLog_Regional_Europe_Ireland.log contains;

Trying to import: Regional/Europe/Ireland



Error: Unable to read rdf file '/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8'. Reason: No such file or directory





here is the dmoz_cron.cgi, in code mode to preserve formatting

Code
#!/usr/bin/perl   

use strict;
use lib '/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin';
use GT::Base;
use GT::Plugins qw/STOP CONTINUE/;
use Links qw/$CFG $IN $DB/;
use LWP::Simple;
use Links::Plugins;
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

# Inherit from base class for debug and error methods
@Plugins::DMOZ_Wizard::ISA = qw(GT::Base);

# Change directory to our admin one...
chdir("/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin");
Links::init("/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin");

# Grab the variable settings we need...
my $cfg = Links::Plugins::get_plugin_user_cfg ('DMOZ_Wizard');
my $_AdminEmail = $cfg->{'Admin_Email'};


# only add this code if the file actually exists..
if (-e "/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8") { print "Removing old content.rdf.u8 and content.rdf.u8.gz files...\n\n"; unlink <content.rdf.*>; }

# grab the main stuff, and do code to parse the slices too...
print "Grabbing http://rdf.dmoz.org/rdf/content.rdf.u8.gz\n\n";
`wget http://rdf.dmoz.org/rdf/content.rdf.u8.gz > /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/logs/DownloadLog.log`;
`mv content.rdf.u8.gz /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8.gz`;
`gzip -d /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8.gz`;

# clean up the content.rdf.u8 file...
&run_rdf_cleanup;

sub send_bad_notice {

my $log = $_[0];
my $sub = $_[1];
my $cat= $log;
$cat =~ s/SliceLog\_//;
$cat =~ s/\.log//;
$cat =~ s/logs\///;

open(READ,"$log") || die "Cant open $log. Reason: $!";
my @data = <READ>;
close(READ);

my $joined = join ("\n",@data);

my $msg = "Import reported an error. The log file $log contains;\n\n" . $joined;

use GT::Mail;
$GT::Mail::error ||= ''; # Silence -w
my $mail = new GT::Mail;
$mail->send (
smtp => $CFG->{db_smtp_server},
sendmail => $CFG->{db_mail_path},
from => $_AdminEmail,
subject => "$sub ($cat)...",
to => $_AdminEmail,
msg => $msg,
debug => $Links::DEBUG
) or Links::fatal ("Unable to send mail: $GT::Mail::error");

} ## end send_bad_notice

sub send_good_notice {

my $log = $_[0];
my $sub = $_[1];
my $cat = $log;
$cat =~ s/SliceLog\_//;
$cat =~ s/\.log//;
$cat =~ s/logs\///;

open(READ,"$log") || die "Cant open $log. Reason: $!";
my @data = <READ>;
close(READ);

my $joined = join ("\n",@data);

my $msg = "Import reported ok. The log file is $log";

use GT::Mail;
$GT::Mail::error ||= ''; # Silence -w
my $mail = new GT::Mail;
$mail->send (

smtp => $CFG->{db_smtp_server},
sendmail => $CFG->{db_mail_path},
from => $_AdminEmail,
subject => "$sub ($cat)...",
to => $_AdminEmail,
msg => $msg,
debug => $Links::DEBUG
) or Links::fatal ("Unable to send mail: $GT::Mail::error");


} ## end send_good_notice

sub Delete_Listing {
# ------------------------------------------------------------
# Here is where we delete a cron listing if they ask us to...
#

my $script_cgi = "dmoz_cron.cgi";
my ($exists,$line, $line_found, $put_back);
my ($into_cron,$back_into_file,$found,@input, @res, $report, $location, $current_dir, $bad);

# grab the cron file first...so we get the updated version locally...
`crontab -l > $CFG->{admin_root_path}/cron.txt`;


# now open the file, and check to see if this option has already been set...
open(FILEIN, "$CFG->{admin_root_path}/cron.txt") || &error("Can't open cron.txt. Reason: $!");
my @input = <FILEIN>;
close(FILEIN);


foreach(@input) {


chomp;
next if /^#/; # skip comments

my $admin_path = $CFG->{admin_root_path};
if ($_ =~ /$admin_path\/$script_cgi/i) { next; } else { $put_back .= $_ . "\n" }
# if it exists, we skip that line, i.ew not putting back into the file...

}

# write it all back into the cron.,txt file, ready to be execued...
open(CRONWRITE, ">$CFG->{admin_root_path}/cron.txt") || &error("Unable to open cron.txt. Reason: $!");
print CRONWRITE $put_back;
close(CRONWRITE);

# get the folder location of this script...
$current_dir = $ENV{SCRIPT_FILENAME}; $current_dir =~ s/admin\.cgi//;

# now we can add it...
$location = $current_dir . "cron.txt";
open PIPE, "crontab $location |";
@res = <PIPE>;
close PIPE or $bad = "$?,$!";

# catch the error if we have one :D
if ($bad) { print "\n\nERROR: Could not delete cronjob. Reason: $bad"; } else { print "\n\nRemoved cronjob entry ok...."; }

} ## end Delete_Listing



# clean up the main RDF file...will NOT work with BIG5, but only UTF8
sub run_rdf_cleanup {

print "Cleaning up RDF file... \n";

`mv content.rdf.u8 content.rdf.u8.2`;

open (CONTENT,"/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8.2") || die $!;
open (WRITEIT,">/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8") || die $!;
while (<CONTENT>) {
if (/[\200-\377]/) {
s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;
}
print WRITEIT $_;
}
close(WRITEIT);
close(CONTENT);

}

# Start extracting for Regional/Europe/Ireland
print "Extracting Top/Regional/Europe/Ireland ... ";
`perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/do_dump.cgi --rdf=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8 --out=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/dmoz.dump --cat=Regional/Europe/Ireland > /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/logs/SliceLog_Regional_Europe_Ireland.log`;
# Run import for Regional/Europe/Ireland
`perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/nph-import.cgi --import=RDF --source="/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/dmoz.dump" --destination="/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/defs" --rdf-category="Top/Regional/Europe/Ireland" --rdf-destination="Regional/Europe/Ireland" --rdf-add-date="2003-09-06" --rdf-update > /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/logs/Regional_Europe_Ireland.log`;
# Lets do a rebuild of the database, so all category counts are correct...
print "Repairing table count on Database ... \n";
`perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/nph-build.cgi --repair`;



Last edited by:

cronjob: Apr 8, 2007, 4:02 AM
Quote Reply
Re: [cronjob] error after SSH download In reply to
Hi,

What happens if you log in via Telnet/SSH, and then run:


Code
perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/do_dump.cgi --rdf=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8 --out=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/dmoz.dump --cat=Regional/Europe/Ireland


Cheers
Andy
Programmer/Designer/LinksSQL Freak Cool

http://www.ultranerds.com
http://www.imagesql.com
Quote Reply
Re: [Andy] error after SSH download In reply to

Code
BEGIN PUTTY LOG... 

-sh-2.05b$ perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/do_dump.cgi --rdf=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8 --out=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/dmoz.dump --cat=Regional/Europe/Ireland

Trying to import: Regional/Europe/Ireland

Error: Unable to read rdf file '/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8'. Reason: No such file or directory-sh-2.05b$

Quote Reply
Re: [cronjob] error after SSH download In reply to
Hi,

Do you have enough disk space? Sounds like its having problems extracting the RDF file from the .gz file.

Cheers
Andy
Programmer/Designer/LinksSQL Freak Cool

http://www.ultranerds.com
http://www.imagesql.com
Quote Reply
Re: [Andy] error after SSH download In reply to
yes my host allows 10GB storage, and before the download I have 7.5GB free, I'll try the whole thing again... will post in 30minutes
Quote Reply
Re: [cronjob] error after SSH download In reply to
ok I started from the top, FTPed to the directory to make sure the huge downloads were not there, then went to mysite.com/cgi/admin/admin.cgi and ran the setup (i didn't tick the cronjob tickbox thats ok though?), ran the script, it told me to SSH to mysite which I did as follows...


Code
  
Last login: Sat Apr 7 22:37:57 2007 from [removed by cronjob]
-sh-2.05b$ cd /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin
-sh-2.05b$ perl dmoz_cron.cgi > log.txt &
[1] 10374
-sh-2.05b$ --11:43:29-- http://rdf.dmoz.org/rdf/content.rdf.u8.gz
=> `content.rdf.u8.gz.1'
Resolving rdf.dmoz.org... 207.200.81.178
Connecting to rdf.dmoz.org|207.200.81.178|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 326,811,325 (312M) [application/x-gzip]

100%[====================================>] 326,811,325 398.99K/s ETA 00:00

11:57:04 (391.69 KB/s) - `content.rdf.u8.gz.1' saved [326811325/326811325]

mv: `content.rdf.u8.gz' and `/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8.gz' are the same file


look at the red error message (my emphasis) doesn't look like a lack of space, looks more complicated

here is the ftp contents of the directory if it helps, could it be a CHMOD problem? or why are there three red files (my emphasis)


Code
  
drwxr-xr-x 19 davidosu davidosu 4096 Oct 5 2006 GT
drwxr-xr-x 9 davidosu davidosu 4096 Oct 5 2006 Links
-r-xr--r-- 1 davidosu davidosu 35834 Aug 1 2006 Links.pm
drwxr-xr-x 3 davidosu davidosu 4096 Feb 19 17:13 Net
drwxrwxrwx 4 davidosu davidosu 4096 Feb 19 17:16 Plugins
-rwxr-xr-x 1 davidosu davidosu 8986 Aug 1 2006 admin.cgi
drwxrwxrwx 2 davidosu davidosu 4096 Feb 28 18:50 backup
-r-xr--r-- 1 davidosu davidosu 2944 Jan 13 2004 bases.pm
-r-xr--r-- 1 davidosu davidosu 56505 Oct 5 2006 checksums.dat
-r-xr--r-- 1 davidosu davidosu 1889 Aug 12 2004 clean_bad_word_links.cgi
-r-xr--r-- 1 davidosu davidosu 4708 Jan 13 2004 constants.pm
-rw-r--r-- 1 davidosu davidosu 93224960 Apr 8 11:01 content.rdf.u8
-rw-r--r-- 1 davidosu davidosu 2097380374 Mar 27 22:33 content.rdf.u8.2
-rw-r--r-- 1 davidosu davidosu 326811325 Apr 5 00:47 content.rdf.u8.gz.1

drwxr-xr-x 2 davidosu davidosu 4096 Oct 5 2006 cron
-rw-r--r-- 1 davidosu davidosu 94 Feb 28 19:41 cron.txt
drwxrwxrwx 2 davidosu davidosu 4096 Apr 8 10:42 defs
-rwxrwxrwx 1 davidosu davidosu 8893 Apr 8 10:42 dmoz_cron.cgi
-r-xr--r-- 1 davidosu davidosu 56885 Feb 19 17:22 dmoz_pre_backup.sql
-r-xr--r-- 1 davidosu davidosu 3041 Jul 1 2003 do_dump.cgi
-rwxrwxrwx 1 davidosu davidosu 77 Apr 8 11:00 log.txt
-rwxrwxrwx 1 davidosu davidosu 49 Feb 28 21:02 log2.txt
drwxrwxrwx 2 davidosu davidosu 4096 Mar 29 20:12 logs
drwxr-xr-x 3 davidosu davidosu 4096 Oct 5 2006 mysqlman
-r-xr--r-- 1 davidosu davidosu 34007 Oct 5 2006 nph-build.cgi
-r-xr--r-- 1 davidosu davidosu 6236 Jul 12 2005 nph-email.cgi
-r-xr--r-- 1 davidosu davidosu 9910 Sep 19 2005 nph-import.cgi
-r-xr--r-- 1 davidosu davidosu 7090 May 4 2006 nph-index.cgi
-r-xr--r-- 1 davidosu davidosu 14822 Feb 15 00:39 nph-verify.cgi
-r-xr--r-- 1 davidosu davidosu 14981 Jul 28 2006 setup.cgi
drwxrwxrwx 6 davidosu davidosu 4096 Oct 5 2006 templates
drwxrwxrwx 2 davidosu davidosu 4096 Apr 8 10:42 tmp
drwxrwxrwx 2 davidosu davidosu 4096 Apr 8 10:42 updates
-r-xr--r-- 1 davidosu davidosu 2353 Nov 22 2003 utfupdate.cgi

Last edited by:

cronjob: Apr 8, 2007, 4:13 AM
Quote Reply
Re: [cronjob] error after SSH download In reply to
Hi,

Mmm... it looks more like the old files are just not being removed.

1) Delete:

content.rdf.u8
content.rdf.u8.2
content.rdf.u8.gz.1

2) In dmoz_cron.cgi, find:


Code
   # clean up the content.rdf.u8 file... 
&run_rdf_cleanup;


..and comment out, like so:


Code
   # clean up the content.rdf.u8 file... 
# &run_rdf_cleanup;


3) Save the file, and then try re-running.

Hopefully that will sort your problem. If not, and you are happy to email over some login details/SSH access, I don't mind taking a look for you.

Cheers
Andy
Programmer/Designer/LinksSQL Freak Cool

http://www.ultranerds.com
http://www.imagesql.com
Quote Reply
Re: [Andy] error after SSH download In reply to
thnx for your help just solved it the solution was:


1) Delete:

content.rdf.u8
content.rdf.u8.2
content.rdf.u8.gz.1

2) In dmoz_cron.cgi, find:


Code
   # clean up the content.rdf.u8 file...    
&run_rdf_cleanup;


..and comment out, like so:


Code
   # clean up the content.rdf.u8 file...    
# &run_rdf_cleanup;


3) Save the file, and then try re-running from the gossamer control panel so that it downloads the huge file.

4) via SSH run the following command
perl /hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/do_dump.cgi --rdf=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/content.rdf.u8 --out=/hsphere/local/home/davidosu/sd14.mysite.com/cgi/admin/dmoz.dump --cat=Regional/Europe/Ireland

it must be that exact command. running the shorter command given in the gossamer control panel gives an error


[/reply]

Last edited by:

Andy: Apr 9, 2007, 9:22 AM
Quote Reply
Re: [cronjob] error after SSH download In reply to
I'll operate in dynamic mode (will never use static) is it ok to delete the
content.rdf.u8
AND
content.rdf.u8.2

because it is now getting all data from the SQL DB, and even if I wanted to go static tomorrow it would compile that from the dmoz.dump file

correct?
Quote Reply
Re: [cronjob] error after SSH download In reply to
Hi,


Quote
I'll operate in dynamic mode (will never use static) is it ok to delete the
content.rdf.u8
AND
content.rdf.u8.2


Sure thing - as long as your DMOZ import is done, these files are fine to remove :)

Cheers
Andy
Programmer/Designer/LinksSQL Freak Cool

http://www.ultranerds.com
http://www.imagesql.com