NETWORK PRESENCE ABOUT SERVICES PRODUCTS TRAINING CONTACT US SEARCH SUPPORT
 


Search
display results
words begin  exact words  any words part 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [FW1] Secure SMTP Server



At risk of receiving a couple hundered damn vacation messages from ignorant people on this that just won't get the hint to turn the damn thing off (the reason I rarely post in public the solutions to peoples FireWall-1 problems),  here's some issues that I've determined with the SMTP security server in FireWall-1 4.1.  Oh, CheckPoint only says "This should be fixed in NG'. They did confirm that these are problems in the existing 4.1 code though.  Our platform is Solaris, so your mileage may vary with NT.

These comments below are from what I was sending CheckPoint support concerning the SMTP security server.
 
 

There are issues with the smtp and mdq processes that I not only have figured
out on my own, but others running FireWall-1 have the exact same issues. Before
we have to stop a firewall to change a value, or CheckPoint support leaves this
trouble ticket open much longer, have you gone through a history of trouble
tickets on the smtp/mdq processes for mail getting stuck in the spool?  There
are many many people having the exact same problem, so I would believe that
your support has already gone down this route in the past and at some point
this information has already been collected. If so, then what was the outcome
of someone elses issue?

Here are some issues that we have determined with the smtp/mdq processes (some
were already mentioned in a previous message).

1. mdq only processes a limited number of messages and does not de-prioritize
messages that are not able to be delivered.  In this example if mdq only
handled 10 messages, and 9 were undeliverable, then that leaves only 1 slot
open for delivering mail. If 10 were undeliverable, then mail would no longer
flow since every slot is filled and mdq will not de-prioritize undeliverable
messages. This is the reason why support and others on FireWall-1 mailing lists
state to move messages out of the spool and slowly move them back. The actual
trick is to only move a small number of the "oldest" messages out of the spool
(I move about 15-20 at a time) until the messages start getting delivered.

2. mdq does not appear to contain execption handling code for 4xx error
messages. If mdq is delivering a message, and the remote mail server sends back
a 4xx error message, such as for a full mailbox or store, the message will not
be sent back to the sender as other MTA's do informing them of the problem.
Instead when the smtp connection is closed, that message is still held in a
slot to be delivered, and the mdq will once again attempt delivery of the
message. The mdq program will keep on attempting to deliver the message over
and over again and completely ignoring the 4xx errors. This also applies to a
short timeout too. If sending a large message to a mail server on a slow link,
and the delivery times out, it will turn around and keep on trying to send it
again and again.  The message will only be 'returned to sender' after the
abandon_time period, which currently is defaulted to 5 days.

3. smtp process does not handle lesser priority and round-robin MX records. If
a host had multiple MX records, such as:

MX 10    mail1.domain.com
MX 20    mail2.domain.com

when a message arrives into the spool the recipient desination mail server is
checked and the IP address is written directly into the mail file in the spool.
In this case the IP address for mail1.domain.com would be used to deliver
messages. If mail1.domain.com is down, FireWall-1 will not attempt to use
mail2.domain.com at all. The same situation applies for round-robin MX records:

MX 10    mail1.domain.com
MX 10    mail2.domain.com
MX 10    mail3.domain.com

Whichever mail server is received in a query is the one that FireWall-1 will
always use to deliver mail. If that mail server was mail3, then the firewall
will only use mail3 to deliver mail.

This is actually one the biggest problem that we are having as to why mail is
not being delivered. We are finding that a large number of messages that are
undeliverable are attempting to be delivered to an unreachable mail server,
while all or most of the other mail servers at the remote end 'are' reachable.
I also tried to manually change the IP address in the mail messages to a
rechable server, however for some unknown reason that information is ignored.
I assume that when a message is read in the spool (it's opened by in.asmtpd
first) the recipient information is re-read and the bad server will be used
again. After a remote mail server is selected, the mdq process then delivers
the message and it closes and unlinks (remove) the file.

Since we already know that there are issues with the in.asmtpd and mdq
processes, any debugging or truss outputs will only show the selection process
for remote mail servers and the mdq processing mail. There are no errors in
truss debugging outputs either other than seeing the 4xx error messages from
remote mail servers that FireWall-1 ignores. The mail problems exist since
deprioritization of messages is not done, there is no multiple MX record
support, and 4xx error messages are not handled. If FireWall-1 performed these
functions (I've never seen an MTA that didn't support these other than
FireWall-1) then there would not be an issue with mail backing up in the spool
directory and this trouble ticket would not be opened.

Is there work being done on the actual code to fix this problem? Or does the
next version fix it and where is the documentation that specifiically states
so?
 

>>>>>>>>>>>>>>>>>>>>>>>>>>  This is from another message I sent them

The abandon_time parameter sets how long a mail will reside in the mail queue
before it is determined that it cannot be delievered and it will be returned
to sender. Normally most mail systems set this to 3 or 4 days, so 1 hour is
really unacceptable.

Also 432000 is 120 hours, which is 5 days,  not 12 hours.

Even if this value were to be changed, that means that the message would need
to be returned to the sender, which would require the dequeurer process to
handle the message. Since the it's the dequeuer that is the problem in that it
gets stuck with a limited number of messages it can handle at a time, the
messages would not be returned to the sender and instead will be stuck in the
queue ($FWDIR/spool or rundir in smtp.conf file).

We did some tests and found that the mdq process only appears to handle a
limited number of messages at at time. It also keeps processing these messages
over and over again until they are delivered and will never process any other
messages unless one or more of these mdq processed messages are delivered.

Example: Let's say there are 100 messages in the spool directory. Let's say
the mdq process can handle 10 messages at once (I don't know yet the real
value, but it must be close to this number). It would then take 10 messages
from the spool directory and try to deliver them. If all 10 are delivered then
it will pick 10 more and deliver them. If 5 messages are delivered then it
will take 5 new messages and handle them, and it will keep on trying to
deliver the previous 5 messages that it couldn't deliver. If all 10 messages
are undeliverable (remote site is down, overloaded, rejected, or any other
reason, etc..), then the mdq process will keep on trying the same 10 messages
over and over and over again and will no longer process any new messages until
at least 1 message has been delivered, then the mdq will process new messages
1 at a time (and 9 old ones). If 2 are delivered then it will handle 2 new
messages at at time (and 8 old ones).

This is an explanation of what is wrong with the mdq program and why people
run across mail getting stuck in the spool directory. Stopping and starting
will not fix it, also changing the various timer settings will only control
how often the mdq program will get stuck and won't help in delivering old
mail. The mdq program needs to be able to determine if a message cannot be
delivered right away, then flag it, and go onto another message and come back
to the old message later. If it did this then it would work fine.

Also if the resend_ period are too short there's another problem that can
occur (that we've seen already). If the resend_period was set to 15, then you
can start overloading a remote mail server. We had several large messages
being sent to a remote mail server on a slow link. The mdq process grabbed
both of the messages as the first mail to send, would connect to the remote
mail server, and then start sending mail. Because the mail is taking so long
to send, or a problem occurs on the remote mail server, it would send a 4xx
error code back to the firewall and disconnect the smtp connection. The mdq
process would then almost immediately (15 sec maximum) reconnect right back to
the remote server and try to send the same messages again since they are at
the top of the mdq process queue. Then the cycle begins with the firewall keep
on connecting and sending the mail over and over again without stopping, thus
possibly overloading the recipients mail server.  This resend_period should
never be less than several minutes (10-30 min) unless the mdq process was able
to deprioritize mail, which it currently does not.
 
 

Ron Atkinson
 

"Schier, Marc" wrote:

 

Dear List,

I did not get any informations about the impacts using the Secure SMTP Server feature on Checkpoint FW 1 (hosted on NT4 OS)

Could you please provide me informations about:

- Performance De-\Increasement of FW
- what happens if source or destination Server fails
- can it be designed redundantly (two destination servers, if the first fails)
- ...

Regards

Dipl.-Ing.(FH)
Marc Schier
EADS Germany GmbH
CF/IM/CN
Phone: +49 89 607 34266
FAX:    +49 89 607 96466
mailto:[email protected]

-- General Information : SMTP Addresses have been changed to: *@eads.net  --  Please update your address book accordingly.

begin:vcard 
n:Atkinson;Ron
tel;fax:tel;work:ext 6543
x-mozilla-html:TRUE
org:Internet Operations Center (IOC);Security
version:2.1
email;internet:[email protected]
title:Systems Engineer
adr;quoted-printable:;;200 Galleria Officentre=0D=0ASuite 109;Southfield;MI;48034;US
x-mozilla-cpt:;0
fn:Ron Atkinson
end:vcard


 
----------------------------------

ABOUT SERVICES PRODUCTS TRAINING CONTACT US SEARCH SUPPORT SITE MAP LEGAL
   All contents © 2004 Network Presence, LLC. All rights reserved.