Wednesday, October 31, 2007

Identify and Recover Deleted Data without a database restore

Customers will come to us and say... "someone deleted/updated some data a while back and we only discovered it now. We don't want to restore a backup because that will undo the work from the last week. Help!"

It may be possible to "rescue" the old data if you still have a suitably older checkpoint somewhere in your database.

First, you need to get a full inventory of all the checkpoints available to you. It is possible that there exist checkpoints in your database that you do not even know about. The following code will give you a full inventory and list the checkpoints (and alternative that they are in) sorted from most recent to oldest creation times.

_pragma(classify_level=debug)
_method ds_version_view.debug!all_checkpoints()
 ## debug!all_checkpoints() : sorted_collection
 ##
 ## returns a sorted_collection of all checkpoints in self and
 ## the tree beneath self sorted from most recent to oldest.
 ## The elements returned are actually simple_vectors of the
 ## form:
 ##       {checkpoint A_CHECKPOINT,string ALTERNATIVE_NAME}
 ##
 _local checkpoints << sorted_collection.new(_unset,
          _proc(a,b)
           >> a[1].time_created > b[1].time_created
          _endproc)
 
 _for a_view _over _self.actual_alternatives()
 _loop
  checkpoints.add_all(a_view.checkpoints.map(_proc(a_cpt)
           _import a_view
           >> {a_cpt,a_view.alternative_path_name()}
          _endproc)) 
 _endloop


 _return checkpoints
_endmethod
$
_pragma(classify_level=debug)
_global show_all_checkpoints <<
_proc@show_all_checkpoints(a_view)
 ## show_all_checkpoints(ds_version_view A_VIEW) : _unset
 ##
 ## shows all checkpoints and related information for A_VIEW and
 ## tree.

 _for cpt_info _over a_view.debug!all_checkpoints().fast_elements()
 _loop
  show(cpt_info[1].time_created,cpt_info[1].checkpoint_name,cpt_info[2])
 _endloop
_endproc
$


To run this code, do something like...

MagikSF> show_all_checkpoints(gis_program_manager.cached_dataset(:gis))
$


... which will give you output like...

date_time(10/30/07 17:34:44) "is!insync_1193790807" :||oracle_sync_parent|
date_time(10/30/07 17:34:44) "is!insync_1193790807" :||oracle_sync_parent|oracle_sync|
date_time(10/30/07 16:32:54) "is!insync_1193787116" :||oracle_sync_parent|
date_time(10/30/07 16:32:53) "is!insync_1193787116" :||oracle_sync_parent|oracle_sync|
date_time(10/30/07 15:33:09) "is!insync_1193783570" :||oracle_sync_parent|
date_time(10/30/07 15:33:08) "is!insync_1193783570" :||oracle_sync_parent|oracle_sync|
date_time(10/30/07 14:34:02) "is!insync_1193780009" :||oracle_sync_parent|
date_time(10/30/07 14:34:02) "is!insync_1193780009" :||oracle_sync_parent|oracle_sync|
date_time(10/30/07 13:33:52) "is!insync_1193776407" :||oracle_sync_parent|


Once you have identified a checkpoint that is old enough to contain your old data, you take one of the two following two approaches (comments in the code explain the differences).

_block
 # if you want to use your GIS tools to see what the data looked
 # like at a previous checkpoint, then get a handle on the
 # current view and go that checkpoint in READONLY mode.

 # DO NOT switch to :writable while at this alternative unless
 # you are ABSOLUTELY SURE of what you are doing.  The safest
 # action to take once you are done viewing data at the old
 # checkpoint is to switch to the "***disk version***"
 # checkpoint in that alternative before continuing.
 _local v << gis_program_manager.cached_dataset(:gis)
 
 v.switch(:readonly)
 v.go_to_alternative("|oracle_sync_parent",:readonly)
 v.go_to_checkpoint("is!insync_1192840224",:readonly)
_endblock
$

... OR ...
_block
 # if you want to keep the main dataset at the current view and
 # then get a second handle to the old_view so that you can copy
 # old data (eg., incorrectly deleted/updated data) into the
 # current view, then get a handle on a REPLICA of the current
 # view and go that previous checkpoint in READONLY mode.

 # DO NOT switch to :writable while at this alternative unless
 # you are ABSOLUTELY SURE of what you are doing.  The safest
 # action to take once you are done viewing data at the old
 # checkpoint is to switch to the "***disk version***"
 # checkpoint in that alternative before continuing.
 _global old_view

 # once you have finished using OLD_VIEW, be sure to :discard()
 # it.
 old_view << gis_program_manager.cached_dataset(:gis).replicate()
 
 old_view.switch(:readonly)
 old_view.go_to_alternative("|oracle_sync_parent",:readonly)
 old_view.go_to_checkpoint("is!insync_1192840224",:readonly)

_endblock
$


If you are going to use data in the old_view to update/recreate data in the current view, then you might want to explore using classes/methods such as
record_transaction.new_update()
record_transaction.new_insert()
ds_collection.at()
ds_collection.insert()
ds_collection.clone_record()
ds_collection.update()
ds_collection.insert_or_update()



[NOTE 2014 June 02: Reinhard Hahn sent me some new code that supercedes the code he originally posted in the comment to this post]

_pragma(classify_level=debug)
_iter _method ds_version_view.all_available_versions(_optional top?)
  ## yields all available Versions of a replicate of
  ## _self in the following order:
  ##
  ## alternatives (if TOP? isnt _false: beginning with ***top***)
  ## -- checkpoints (beginning with :current)
  ## -- base versions for checkpoints (only for alternative_level
  ##    > 0)
  ##
  ## for each version yields
  ## A_VIEW,ALTERNATIVE_PATH_NAME,CHECKPOINT_NAME,BASE_LEVEL, CHECKPOINTS_TIME_CREATED
  ## where A_VIEW is always a replicate, which will be discarded
  ## outside this method. The order of the CHECKPOINTS will be in
  ## reversed chronological order (i.e. beginning with the most
  ## recent one)
  ##
  ## BASE_LEVEL is 0 for the version itself, -1 for the immediate
  ## base version, -2 for the base version of the base version
  ## and so on 
  ##
  ## example:
  ## v,:|***top***|,:current,0,v.time_last_updated()
  ## v,:|***top***|,"cp_1",0,cp_1.time_created
  ## v,:|***top***|,"cp_2",0,cp_2.time_created
  ## v,:||alt1|,:current,0,v.time_last_updated()
  ## v,:||alt1|,:current,-1,v.time_last_updated()
  ## v,:||alt1|,"cp11",0,cp11.time_created
  ## v,:||alt1|,"cp11",-1,v.time_last_updated()
  ## v,:||alt2|,":current",0,v.time_last_updated()
  ## ...
  ##
  ## NB:
  ## If the different versions of _self have different
  ## datamodels, you might be annoyed by messages like the following:
  ##
  ##  compiliing my_table.geometry_mapping
  ##  fixing up details for dd_field_type(my_field_type) : Version 12345, transition to 54321
  ##  Defining my_other_table
  ##
  ## To avoid this, you might want to suppress any output by
  ## redefining the dynamics !output! and suppressing the output
  ## of the information conditions db_synchronisation and
  ## pragma_monitor_info. Of course you have then to put any
  ## wanted output to other output targets like !terminal! or a
  ## locally defined external_text_output_stream. So your code
  ## might look like the following: 
  ## MagikSF> t << proc(a_database_view)
  ##  _handling db_synchronisation, pragma_monitor_info _with procedure
  ##  _dynamic !output!  << null_text_output_stream
  ##
  ##  # ... any more preparation code ...
  ##
  ##  _for r,alt_pname,cp,base_level,cp_time _over a_database_view.all_available_versions()
  ##  _loop
  ##
  ##    # ... my code running in the loop, for example:
  ##    !terminal!.show(alt_pname,cp,base_level,cp_time)
  ##    !terminal!.write(%newline)
  ##    !terminal!.flush()
  ##    # ...
  ##
  ##  _endloop
  ##
  ##  # ... any summary output, for example:
  ##  !terminal!.write("Ready.",%newline)
  ##  !terminal!.flush()
  ##  # ...
  ##
  ## _endproc.fork_at(thread.low_background_priority)
  ##
  ## If you don't want to use these !terminal! calls, you could
  ## also dynamicall redefine !output!:
  ## MagikSF> t << proc(a_database_view)
  ##  _handling db_synchronisation, pragma_monitor_info _with procedure
  ##  _dynamic !output!  << null_text_output_stream
  ##
  ##  # ... any more preparation code ...
  ##
  ##  _for r,alt_pname,cp,base_level,cp_time _over a_database_view.all_available_versions()
  ##  _loop
  ##    _block
  ##      _dynamic !output! << !terminal!
  ##      # ... my code running in the loop, for example:
  ##      show(alt_pname,cp,base_level,cp_time)
  ##      # ...
  ##    _endblock
  ##  _endloop
  ##
  ##  !output! << !terminal!
  ##  # ... any summary output, for example:
  ##  write("Ready.")
  ##
  ## _endproc.fork_at(thread.low_background_priority)

  _local change_alt_level <<
    _if top? _isnt _false 
    _then
      >> - _self.alternative_level
    _else
      >> 0
    _endif
  _local r << _self.replicate(change_alt_level,_true)
  _protect
    _for rr _over r.actual_alternatives(:same_view?,_true)
    _loop
      _local alt_name << rr.alternative_path_name()
#      _if alt_name.write_string.matches?("*{consistency_validation}") _then _continue _endif
      _local cps << rr.checkpoints.as_sorted_collection(_proc(lhs,rhs)
                      _if lhs _is :current
                      _then
                        _if rhs _is :current
                        _then
                          _return _maybe
                        _else
                          _return _true
                        _endif
                      _elif rhs _is :current
                      _then
                        _return _false
                      _endif 
                      _local res <<  rhs.time_created _cf lhs.time_created
                      _if res _isnt _maybe
                      _then
                        _return res
                      _endif
                      _return lhs.checkpoint_name _cf rhs.checkpoint_name
                    _endproc)
      cps.add(:current)
      _for cp _over cps.fast_elements()
      _loop
        _local cp_name
        _local cp_time 
        _if cp _is :current
        _then 
          rr.rollforward()
          cp_name << :current
          cp_time << rr.time_last_updated()
        _else
          cp_name << cp.checkpoint_name
          rr.go_to_checkpoint(cp_name)
          cp_time << cp.time_created
        _endif 
        _loopbody(rr,alt_name,cp_name,0,cp_time)
        _if rr.alternative_level > 0
        _then
          _local rrr << rr.replicate(:base,_true)
          _protect
            _local level_diff << -1
            _loopbody(rrr,alt_name,cp_name,level_diff,rrr.time_last_updated())
            _loop
              _if rrr.alternative_level = 0 _then _leave _endif
              rrr.up_base()
              level_diff -<< 1
              _loopbody(rrr,alt_name,cp_name,level_diff,rrr.time_last_updated())
            _endloop 
          _protection
            rrr.discard()
          _endprotect
        _endif 
      _endloop
    _endloop
  _protection
    r.discard()
  _endprotect
_endmethod
$

1 comment:

Anonymous said...

I use a iter method that has two improvements with respect to your all checkpoints loop. First it uses a replicate view (to be sure not to have much action in the graphics view). Second it tries out the base versions of all checkpoints as well. The result reads as follows:

_pragma(classify_level=debug)
_iter _method gis_ds_view.debug!all_available_versions()
## yields all available Versions of a replicate of
## gis_program_manager.cached_dataset(vn) in the following order:
##
## alternatives (beginning with ***top***)
## -- checkpoints (beginning with :current)
## -- base versions for checkpoints (only for alternative_level
## > 0)
##
## for each version yields
## A_VIEW,ALTERNATIVE_PATH_NAME,CHECKPOINT_NAME,BASE?, CHECKPOINTS_TIME_CREATED
## where A_VIEW is always a replicate, which will be discarded
## outside this method. The order of the CHECKPOINTS will be in
## reversed chronological order (i.e. beginning with the most
## recent one)
##
## example:
## v,:|***top***|,:current,_false,_unset
## v,:|***top***|,"cp_1",_false,cp_1.time_created
## v,:|***top***|,"cp_2",_false,cp_2.time_created
## v,:||alt1|,:current,_false,_unset
## v,:||alt1|,"cp11",_false,cp11.time_created
## v,:||alt1|,:current,_true,_unset
## v,:||alt1|,"cp11",_true,_unset
## v,:||alt2|,":current",_false,_unset
## ...
_local r << _self.replicate(- _self.alternative_level,_true)
_protect
r.goto_top_alternative()
_for rr _over r.actual_alternatives(:same_view?,_true)
_loop
_local alt_name << rr.alternative_path_name()
# _if alt_name.write_string.matches?("*{consistency_validation}") _then _continue _endif

rr.rollforward()
_loopbody(rr,alt_name,:current,_false)
_local cps << rr.checkpoints.as_sorted_collection(_proc(lhs,rhs)
>> lhs.time_created _cf rhs.time_created
_endproc)
_for cp _over cps.elements_in_reverse()
_loop
_local cp_name << cp.checkpoint_name
rr.go_to_checkpoint(cp_name)
_loopbody(rr,alt_name,cp_name,_false,cp.time_created)
_endloop
_if rr.alternative_level > 0
_then
rr.rollforward()
_local rrr << rr.replicate(:base,_true)
_protect
_loopbody(rrr,alt_name,:current,_true)
_protection
rrr.discard()
_endprotect
_for cp _over cps.elements_in_reverse()
_loop
_local cp_name << cp.checkpoint_name
rr.go_to_checkpoint(cp_name)
rrr << rr.replicate(:base,_true)
_protect
_loopbody(rrr,alt_name,cp_name,_true)
_protection
rrr.discard()
_endprotect
_endloop
_endif
_endloop
_protection
r.discard()
_endprotect
_endmethod
$

Cheers Reinhard Hahn