Re: Client sqlplus SELECT + SPOOL knocks down server network



On Mar 11, 9:59 am, Carlos <miotromailcar...@xxxxxxxxxxxx> wrote:
On 11 mar, 09:43, Carlos <miotromailcar...@xxxxxxxxxxxx> wrote:





On 10 mar, 18:27, DA Morgan <damor...@xxxxxxxxx> wrote:

Carlos wrote:
On 10 mar, 14:08, DA Morgan <damor...@xxxxxxxxx> wrote:
Carlos wrote:
On 10 mar, 13:21, DA Morgan <damor...@xxxxxxxxx> wrote:
Carlos wrote:
Hello all:
I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise
Linux 5 server.
This morning someone was issuing a SELECT from a windows client via
ODBC and suddenly all the server communications broke down: even no
ping from or to that server worked.
Tried stop&start network but no avail. The server appears to be dead
at communications until reboot.
So I tried to repeat the SELECT from sqlplus on a windows client to a
OS file with SPOOL and all the communications on the server are gone
once again. Ping only answers to 127.0.0.1 or its own IP address, and
Linux is telling me that the network card is OK. Stop&start network
didn't fix the problem either.
Client sqlnet.log shows:
Fatal NI connect error 12170.
  VERSION INFORMATION:
   TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
   Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version
10.2.0.1.0 - Production
  Time: 10-MAR-2008 12:11:59
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535
    TNS-12535: Message 12535 not found; No message file for
product=NETWORK, facility=TNS
    ns secondary err code: 12560
    nt main err code: 505
    TNS-00505: Message 505 not found; No message file for
product=NETWORK, facility=TNS
    nt secondary err code: 60
    nt OS err code: 0
  Client address: <unknown>
The select is a two column (varchar2(9), char(1)) select of 11 million
rows aprox.
Could anybody tell me where to search next? (I didn't find anything at
metalink).
TIA.
Cheers.
Carlos.
Some of what you've written seems self-conflicting.
You start off telling us you've 10.2.0.3 and then everything
thereafter indicates 10.2.0.1.
Then you say "dead until reboot." Does that mean it is now
working fine or not?
How about queries using Oracle's native interface SQLNET
rather than ODBC?
Ping only answers to 127.0.0.1, I presume on that machine,
have you verified that other addresses exist?
Have you performed an ipconfig /renew?
Perform an ipconfig /all ... what does it show?
Are you using DHCP?
Can you ping the server from elsewhere on the network?
What are your firewall settings?
What tool is connecting using ODBC?
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damor...@xxxxxxxxxxxxxxxx (replace x with u to respond)
Puget Sound Oracle Users Groupwww.psoug.org
Hello Daniel:
You start off telling us you've 10.2.0.3 and then everything
thereafter indicates 10.2.0.1.
Yup: 10.2.0.3 is the server on EL5. 10.2.0.1 was the windows client
from which I did the test.
Then you say "dead until reboot." Does that mean it is now
working fine or not?
Working fine after reboot... until I issue the SELECT again.
How about queries using Oracle's native interface SQLNET
rather than ODBC?
This is what I did on the windows 10.2.0.1 client with sqlplus.
Ping only answers to 127.0.0.1, I presume on that machine,
have you verified that other addresses exist?
Yes. If I ping the server network IP address from the server itself,
it answers OK, but no answer from any other IP in the network. If I
ping the server from another machine there is no answer.
Are you using DHCP?
Nope.
Can you ping the server from elsewhere on the network?
Nope
What are your firewall settings?
No firewall. Isolated environment.
What tool is connecting using ODBC?
Visual FoxPro. But, as said before, sqlplus bring the server
communications to its knees too.
I'll try the ipconfig (after repeating the whole process from startup
the server again).
Thank you very much.
Cheers.
Carlos.
A few additional thoughts.

1. Upgrade the client to 10.2.0.3
2. Post the SQL statement: I'd like to see it
3. Capture the statement and see if it contains any non-visible
    characters when Fox sends it through
4. What ODBC driver? Try changing drivers

Is it only this one SQL statement or can more than one statement
do it? What happens if you take the problem statement and
reconstruct it from scratch in SQL*Plus starting with the
simplest SELECT single_column FROM single_table and then
incrementally add back its full functionality? When does it
break?
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damor...@xxxxxxxxxxxxxxxx (replace x with u to respond)
Puget Sound Oracle Users Groupwww.psoug.org

Daniel:

I am doing the test from sqlplus (no ODBC this time)

2. Post the SQL statement: I'd like to see it

select /*+ full(a) */
a.col1,
a.col2
from schema.table_name a;

It starts displaying/spooling and then breaks with:

...
955800207 T
ERROR:
ORA-03135: la conexión ha perdido contacto

10078290 filas seleccionadas.

ERROR:
ORA-03114: no conectado a ORACLE

(There are 11339771 rows in the table).

From here on, there is no ping from or to the server.

ifconfig on the server shows:

RX packets:674109 errors:0 dropped:0 overruns:0 frame:0
TX packets:674109 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:53271778 (50.8 MiB) TX bytes:259126296 (247.1 MiB)
Interrupt:169

And it's time to reboot again...

Cheers.

Carlos.

Aha:

https://metalink.oracle.com/metalink/plsql/f?p=130:14:114792536629517....

The title is: Subject:  ORA-03113 Or ORA-03135 Reported for Failover
with Network Outage

The doc id is: Note:455771.1

The date is: 10-MAR-2008

That's today.

The bug number is:
Bug 5758870 Select Type TAF fails with ORA-3113 If Network Cable Pulled off

The solution is:
Problem is fixed from 11g onwards and in 10.2.0.4 Patch set release.
One off patchs for some platforms are available.
Apply one-off Patch 5758870 for your platform, if available.

Are you on a RAC cluster?
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damor...@xxxxxxxxxxxxxxxx (replace x with u to respond)
Puget Sound Oracle Users Groupwww.psoug.org

Daniel:

First of all: thank you very much for your attention and time.

Are you on a RAC cluster?

Nope. Single Instance.

I've taken a look at the metalink note that you kindly pointed, but no
TAF here.

I'm beggining to suspect of a faulty network card. I've noticed that
the problem occurs when there is a long information transfer from
server to client, but not based on size, but on the time the transfer
takes. I checked with FTP and big files, but it takes not too much
time to transfer (the network card and switch are 1000Mbps) and the
error does not arise. But when one sqlplus session takes too long to
pull out a query the error appears in a random fashion: sometimes at
10M rows, sometimes at 7M rows, even at 700000 rows.

The server itself works OK as far as its 'local' network activity
goes: the listener is up & running and the enterprise manager works
too (from a local browser). The same query from a local sqlplus (with
listener connection, no bequeath) works too.

Michael:

This can be tested by moving the system cable to a different switch port

Yes. I thought of that too (switch issues). But the network boy swears
everything is OK. I'll try to convince him to test on other switch.

Thank you all for the suggestions and hints.

Cheers.

Carlos.

Solved (or so it seems)...

After a lot of tests, I discovered that the problem always appeared
when the client and the server were connected to the same switch
(!!!). When they were connected to different switches the query wenk
OK (I think it's because there where 100mbps sections between them: I
suspect of some kind of overflow issue at the network card).

I searched for a linux driver for the network card (Marvell Yukon).
After downloading it, I moved my way through the installation process
(compiling with the Linux source files) and finally 'modprobed' the
driver...

...and the problem has not appeared again since then (fingers
crossed)!

Cheers.

Carlos

We are just starting to take a look at linux.

Modprobed the driver ... what does that mean?
.



Relevant Pages

  • Connection issue, Server IP address wrong
    ... A user has a Desktop machine connected to the network, which has Oracle ... A while back the user decided to install Oracle 10i, ... Server version, for a sales presentation off site. ...
    (microsoft.public.windows.server.sbs)
  • Re: Client sqlplus SELECT + SPOOL knocks down server network
    ... I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise ... This morning someone was issuing a SELECT from a windows client via ... ODBC and suddenly all the server communications broke down: ... Linux is telling me that the network card is OK. ...
    (comp.databases.oracle.server)
  • Re: Client sqlplus SELECT + SPOOL knocks down server network
    ... I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise ... This morning someone was issuing a SELECT from a windows client via ... ODBC and suddenly all the server communications broke down: ... Linux is telling me that the network card is OK. ...
    (comp.databases.oracle.server)
  • Re: Client sqlplus SELECT + SPOOL knocks down server network
    ... I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise ... This morning someone was issuing a SELECT from a windows client via ... ODBC and suddenly all the server communications broke down: ... Linux is telling me that the network card is OK. ...
    (comp.databases.oracle.server)
  • Re: from very early this morning...
    ... I just changed my network topology when I stood up the new file ... server. ... I've got three more NICs installed to make up the remaining ...       network configuration. ...
    (freebsd-questions)

Loading