Re: implementing a database log



On Apr 24, 10:11 pm, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"David BL" <davi...@xxxxxxxxxxxx> wrote in message

news:fd63f466-18f7-4986-b378-f5e9f512bbd8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx





On Apr 22, 11:39 pm, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"David BL" <davi...@xxxxxxxxxxxx> wrote in message

news:265621e9-25fd-46d5-9e2d-7a4f63fa84b4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

On Apr 22, 6:58 am, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"Christoph Rupp" <cruppst...@xxxxxxxxx> wrote in message

news:91140c56-1f05-4b5d-b45f-b34920db2051@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Brian,

On Apr 21, 11:00 pm, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx>
wrote:
Why not go with #4:

4. a physical log based on modified rows. Whenever a row is
modified,
added
or removed, it is logged. Then you could also implement row
versioning--just add a row version field to the physical rows. I
believe
that this what snapshot isolation is built on.

It's not an SQL database, i don't even have the notion of "rows",
but
basically i think your #4 is the same as my #1 or #2.

No, it isn't. #1 requires the logging of additional records that may
not
have been affected by an update. #2 doesn't log the entire changed
record,
but only bits and pieces. I would think that limiting the units of
change
to individual records--entire records--would simplify the process of
marking
and isolating units of work while at the same time guaranteeing
consistency.

I don't think an atomic unit of work is always associated with a
change to an individual record. Are you suggesting transactions to
define arbitrarily large units of work aren't needed?

No, that's not what I'm suggesting. What I'm suggesting is that the
atomic
unit of work should be a /set/ of /records/--either the old records in
the
case of a before image or the new records in the case of an after image.

Ok but that sounds like the system snapshots an entire table in the
before/after images.

For efficiency one would expect to only store the set of records that
have been added and the set that have been removed by a given
transaction. It is easy to see how we get the inverse which is
required for roll back of an uncommitted transaction during recovery.
However these logical operations aren't idempotent (at least for bags
of records). How does recovery deal with non-idempotent redo()/
undo() changes in the log?

Why would there be bags of records? At the physical level, each record has
a specific offset in some file, and that offset would uniquely identify it.
Why would you strip off that identification? Consequently, there wouldn't
be bags of records, only sets of records.

One could say the log records are representing physical changes if
they record the physical locations.


If you start out with a set of records and you know what is to be inserted,
updated, and deleted, you can compute the resulting set of records. If you
start out with the resulting set of records, and you know what was inserted,
updated and deleted in order to arrive at that result, then you can compute
the original set of records. For simplicity, even if it isn't necessarily
the most efficient, what is updated could be implemented as a set of ordered
pairs of records, tying each original record to its replacement. So the log
would consist of a sequence of triples (D, U, I) separated by transaction
markers where D is a set of records that were deleted, U is a set of pairs
of records that were updated, and I is a set of records that were inserted.

Now, provided that the log is written before the database--that is, (1)
write the triple to the log, (2) write the database, (3) write the
transaction marker in the log--, it should be possible to determine whether
or not what was written to the log actually made it into the database, and
thus it should be possible to roll back any uncommitted transaction.

A modern DBMS using WAL will tend to use a "lazy writer" that writes
dirty pages to disk in the background. For performance this will
tend to write dirty pages in a physical order so the disk head moves
uniformly over the platter. This assumes the only constraint between
the writing to the log and the dirty pages is the WAL constraint - ie
dirty pages must be written to disk strictly after the associated log
records have been written to disk. To meet this constraint the lazy
writer simply needs to ignore dirty pages that haven't (yet) been
logged.

Your suggestion brings far more onerous constraints on the disk
writing policies. In fact to make a transaction durable it seems
necessary to write all the dirty pages to disk as part of the
transaction. This is certainly not normally the case in a modern
DBMS. A database that avoids a "force" policy requires REDO during
recovery.

Often a DBMS that supports long running transactions will allow dirty
pages to be written to disk even though the associated transaction
hasn't yet committed. This is the so called "steal" policy and leads
to the need for UNDO during recovery.

AFAIK most databases have very flexible disk writing policies (steal/
no-force) and require both UNDO and REDO during recovery. See the
ARIES algorithm.

.



Relevant Pages

  • Re: implementing a database log
    ... required for roll back of an uncommitted transaction during recovery. ... dirty pages to disk in the background. ...
    (comp.databases.theory)
  • Re: implementing a database log
    ... required for roll back of an uncommitted transaction during recovery. ... dirty pages to disk in the background. ... transaction marker is written to the log. ...
    (comp.databases.theory)
  • Re: implementing a database log
    ... required for roll back of an uncommitted transaction during ... or not what was written to the log actually made it into the database, ... dirty pages to disk in the background. ...
    (comp.databases.theory)
  • Re: [GIT PULL] Ext3 latency fixes
    ... which unplugged the write queue after every page write. ... In guarded mode the on disk i_size is not updated until after the data ... The end_io handler puts the buffer onto the per-sb list of guarded ... One big gotcha is that we starting a transaction while a page is ...
    (Linux-Kernel)
  • Re: Check Disk
    ... [[Specifies whether Windows repairs file-system errors found during disk ... That should reset the dirty bit. ... >> You have to reboot for chkdsk to run. ... Autochk runs if you try to run Chkdsk on the boot volume. ...
    (microsoft.public.windowsxp.perform_maintain)