Recover lost data in linux

In the past years I wrote several small bash scripts for sorting recovered data and I wrote some articles in my blog about data-recovery in Linux. Today I found a few more useful applications for data-recovery and I lost (due to pebkac (problem exists between keyboard and computer)) all my data on a disk. Let’s

As always, this article comes without any warranty and I take no responsibility for anything. My articles are written in the hope to be helpful to you, still they might contain mistakes or even errors and you should always review what, which command is doing before simply using it. By doing so, you’ll also learn how to use it. Anyway: USE of my article/guide stuff from this article, is ON YOUR OWN RISK.

Theory

Data-Recovery is a wide topic which can range from restoring data from an old or failing CD, over a failing harddisc due to bad-sectors up to a harddisc whose power pins are broken and re-soldering is required. You might also have a disc which is not corrupt and you accidentally deleted the partitions or re-formatted it. Whatever you’re dealing with: The first step is to create an image. Thats because if you have a failing disc the disc might get worse if you work directly on it. Even if your disk is fine, you might risk destroying useful data when working directly on it. Do yourself a favour and create an image of the disc first. This also means that you have to have enough storage space available for a) storing the image b) storing the recovered-information.

The following tools might be used to create an image of a harddisc or partition:

tool link notes
dd website Part of coreutils. dd will stop on I/O errors and is hence a bad choice in case your disk is failing / corrupted or contains bad sectors.
(g)ddrescue website That’s my preferred tool to create images, it will continue on I/O errors and got some useful settings.
dd_rescue website This is another similar utility to (g)ddrescue, not sure if it should still be used.
safecopy website Can’t say much, the documentation looks fine, the examples look fine, but I haven’t had time to try it, yet.

Once you got an image of your media, you might want to make a backup of that image, because all further work on the image might destroy data on the image and hence keeping a backup of it is a good idea, too. Next steps in data-recovery would be to check if the partitions are still fine and if not, restore them. Useful tools to repair partitions are standard tools available on every linux system like „fdisk“. There’s also another tool I stumbled over yesterday: gpart which looks very promising. If your partitions are fine, you could try filesystem-repair. If all your files are still there you can stop here, lucky guy. If that wasn’t working or if that filesystem-repairing deleted too many files we need to go deeper using tools which will search on the harddisc for files. Good tools to recover files are:

tool link notes
foremost website jpg, gif, png, bmp, avi, exe, mpg, wav, riff, wmv, mov, pdf, ole, doc, zip, rar, htm, cpp
photorec website file formats
recoverjpeg website JFIF (JPEG)

Creating an Image of failing media

ddrescue performs all actions automatically so there might be no need to invoke it several times manually. However, here’s a little description on how to do it:

ddrescue infile outfile logfile

thats the basic invokation in a possible one-pass scenario.

ddrescue -v -n infile outfile logfile

thats verbose, without splitting and retrying bad sectors, which is handy for some sort of multipass processing, it’s due to that also a bit faster. Right after that first pass you can do several more passes using -r and -R for example:

ddrescue -v -r 2 infile outfile logfile

this will try the bad sectors again with 2 tries

ddrescue -v -r 2 -R infile outfile logfile

this will try the bad sectors again in reverse copy direction with 2 tries. Rember, this time there isn’t -n. Apart from these options you can define a few other options, for example:

 -b, --block-size=<bytes>   sector size of input device [default 512]
-c, --cluster-size=<sectors> sectors to copy at a time [128]
-d, --direct        use direct disc access for input file

The first option should be set to your discs value, you can check that using:

# ~> fdisk -l /dev/sda | grep "Sector size"
Sector size (logical/physical): 512 bytes / 512 bytes

So in this case, the default is fine. The second option is just for performance (i’d leave it at it’s default) and the third option is interesting. See, accessing a disc using cache is usually faster than accessing a disc directly. Our first try shouldn’t use -d. Our next tries, where we want to recover „broken“ blocks or sectors, shouldn’t use cache so we use -d there.

putting this all together, to get the most out of your disc, you can do the following:

# try normally with cache
ddrescue -n -v infile outfile logfile
# avoid cache, retry 2 times
ddrescue -d -v -r 2 infile outfile logfile
# avoid cache, retry 3 times, retrimmed,
# trying to get full sectors
ddrescue -d -v -r 3 -M -C infile outfile logfile
# avoid cache, retry 2 times, reverse copy direction
ddrescue -d -v -r 2 -R infile outfile logfile

If you’re trying to recover a CD or DVD -R might be more useful, also you can combine the results of different cd readers, by using the same log and the same outfile, just different in file. i.e.:

ddrescue /dev/cdrom1 outfile logfile
ddrescue /dev/cdrom2 outfile logfile

So if one of your players reads parts of the cd, the other doesn’t, you’ll might be lucky and end up with a 100% working image of that CD.

Thats why ddrescue is one of the most important Tools for me.

Getting the partitions back
Now that there is an image of the disc, we can check the partitions of it using fdisk or cfdisk. If there aren’t any partitions or if they’re wrong we can try to get them back using gpart. You can also manually re-create your partitions if you know them and their exact sizes/settings. However. First check the partitions:

root@ubuntu:~# fdisk -l /mnt/sdb.ddr
You must set cylinders.
You can do this from the extra functions menu.
 
Disk /mnt/sdb.ddr: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe222307e
 
 Device Boot  Start  End  Blocks  Id  System
root@ubuntu:~#

As you can see, that disc does not contain any partitions. Now let’s take a look at gpart:

root@ubuntu:/mnt# gpart sdb.ddr
 
Begin scan...
Possible partition(Linux ext2), size(94mb), offset(0mb)
Possible partition(Linux ext2), size(9538mb), offset(94mb)

As you can see, gpart was able to get the partitions back by heuristically determining them. gpart takes a long time, though. You can write that back using gpart -W sdb.ddr sdb.ddr.

Another way to restore the partitions is using testdisk. Just issue testdisk sdb.ddr and go through the „analyse“ part. It will ask you for the used partition-table-type (which is most likely either intel/pc or GPT) and then do a fast scan. If that didn’t help you can still use „search deeper“.

Disk sdb.ddr - 500 GB / 465 GiB - CHS 60802 255 63
Analyse cylinder 50754/60801: 83%
 
 
  Linux                    0   1  1    11 254 58     192712
  Linux                   12   0  1  1227 254 63   19535040

Filesystems
Here I assume, that your partitions are working and you can hence access them with filesystem-check tools. If that is not the case, you can skip this and go to the next section (the hard way).

ext3/ext4

Just run fsck.ext* and take a look at it’s manpage. For example:

root@ubuntu:~# fsck.ext4 -fD /dev/sdc1
e2fsck 1.41.11 (14-Mar-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information
 
/dev/sdc1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdc1: 81139/30531584 files (1.6% non-contiguous), 96052810/122096374 blocks

Yes, in this case I did it onto the disc directly and not on an image – If you’re taking an image of one partition, you can do that onto the file. If you took an image of a whole harddisc, you can’t do that directly, you need to export the partitions using kpartx or losetup (As I did for the ntfs disc below).

It might ask you some questions, carefully answer them. After it finished it’s work you should carefully check that all your files are there; if they’re not you’ll need to do this differently (which is why you should work on an image and not directly on the disc).

Depending on the used ext-version there are a few other ways to recover deleted files, so take a look at them (and at the ubuntu wiki which I linked at the bottom of this page, in my opinion the use of the ext* tools are explained pretty good over there).

tool link
e2undel website
extundelete website

ntfs

There are several different approaches to process ntfs. For example you could use a virtual environment to boot the windows cd and use the rescue console to use rescue tools of it. like

chkdsk /R

However, suddenly that tages ages here and seems to stop at 50% so i came up with something else. I used losetup to bind partitions to a loop devices and then i ran ntfsfix over them.

losetup takes the optional argument -o (offset) which is bytes per sector * sectors per cylinder. So basically, if you issue:

fdisk -l /your/image
Disk beate_sdb.ddr: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x3cc7d78b
 
        Device Boot      Start         End      Blocks   Id  System
beate_sdb.ddr1           16065   195318269    97651102+   f  W95 Ext'd (LBA)
beate_sdb.ddr2       195318270   199318454     2000092+  83  Linux
beate_sdb.ddr3   *   199318455   976751999   388716772+   7  HPFS/NTFS/exFAT
beate_sdb.ddr5           16128   195318269    97651071    7  HPFS/NTFS/exFAT

You can see in the first second column the value we need. And at the top you can see that the sector size is 512 bytes. i.e., to bind partitions 3 and 5 we use:

512*199318455=102051048960
512*16128=8257536
losetup /dev/loop0 beate_sdb.ddr -o 102051048960
losetup /dev/loop1 beate_sdb.ddr -o 8257536

And now you can run ntfsfix over them:

root@chani /storage/backups/chani # ntfsfix /dev/loop1 
Mounting volume... FAILED
Attempting to correct errors... 
Processing $MFT and $MFTMirr...
Reading $MFT... OK
Reading $MFTMirr... OK
Comparing $MFTMirr to $MFT... OK
Processing of $MFT and $MFTMirr completed successfully.
Setting required flags on partition... OK
Going to empty the journal ($LogFile)... OK
NTFS volume version is 3.1.
NTFS partition /dev/loop1 was processed successfully.

Now run it once again:

root@chani /storage/backups/chani # ntfsfix /dev/loop1 
Mounting volume... OK
Processing of $MFT and $MFTMirr completed successfully.
NTFS volume version is 3.1.
NTFS partition /dev/loop1 was processed successfully.

mount it:

root@chani ~ # ntfsmount /dev/loop1 /mnt/ -o force
WARNING: Dirty volume mount was forced by the 'force' mount option.

Now go on, and take the data you need.

The difficult approach

Sometimes mounting the disc is not possible anymore; It’s still possible to get some data from the disc. I always suggest to create an image. The more time you spend for the image the more data you’ll get out afterwards.

However. If it’s not possible to mount the image or files are deleted and you want to try „harder“ to get something, you’ll want to continue reading. I’ll suggest some applications which are useful in data recovery.

Lost Graphics (jpg) – recoverjpeg

Image you missed to take a backup of your stored graphics and you really want them back because those are pictures from your pets, family or other important-to-you things. If you’re searching for „JPEG“ pictures you might be lucky and you could use recoverjpeg. Whatever sort of picture you lost – With recoverjpeg there is a fast way to restore such data. Get recoverjpeg from here. Follow the instructions and run it onto the whole disc or image (guys, i already wrote that you should prefer an image over a disc ALWAYS). Doesn’t matter whether your disc/image is mounted or not, though you should have it not mounted to not risk that lost data gets overwritten by something else.

Recoverjpeg will try to get all jpegs and store them in the current directory. So it’s probably better if you do a:

mkdir ~/my_restored_pictures && cd ~/my_restored_pictures

and run recoverjpeg from there.

Some of the graphics will most likely contain wrong colors, artifacts or are corrupted. Anyway: i was able to bring ~ 40 000 pictures back and most of them are useful. Compared to other tools which I tested, recoverjpeg seems to be the fastest.

Lost Files (some) – PhotoRec

PhotoRec seems to be a really really nice tool, I’m just wondering a bit about some things within PhotoRec but for now i haven’t looked at them, there might be some answers around on their page. The „Estimated“ time is „just“ wrong. When i started to run PhotoRec on my first disc it has shown „estimated time: 20h“. When i came back 7 hours later it has shown „estimated time: 21h“ – When i came back 20 hours later it has shown „estimated time: 17 hours“.

If your data is really important, the time shouldn’t matter – just run it and hope to get as much as possible. I got a lot of txt files, sorting of them will be nearly impossible (over 50k files) so.. hope that you don’t have important stuff within txt files as I. mp3’s (hey.. totally unimportant to me, though I just wanna note) are splitted into several files – at least some of them. So listening the mp3’s isn’t really working. Well i could try to „put“ them together using some other linux tools, ofc.

Just get PhotoRec from here. As PhotoRec is getting a lot of other files (you can set which filetypes it should try to restore) this is a really helpful tool. I run PhotoRec on the whole disc/image without defining a filesystem (well I set „other“) and without setting a partition (whole disc). Take a good look at their documentation – It’s well explained. I was able to restore a lot of Data using PhotoRec.

Another .. Way to restore files – foremost

In general, to restore lost data, you need to look at the harddisc in raw format. Without a filesystem layer between. In this raw data you will find everything. Every file (as far as I know) is starting with a specific header, and thus telling you or a tool what file it is. With tools like foremost, which know about these headers, you can restore files. So let’s take a more closer look at foremost.

You can get foremost here, in case it’s not in your distribution. There is by the way another similar tool, though it doesn’t seem to be actively developed (not sure). Anyway, you can give Scalpel a try.

We got a disc as /dev/sdb and want to restore PDFs from this disc using foremost. We would do:

~ mkdir /sicherung/pdfs
~ cd /sicherung/pdfs
~ foremost -v -t pdf -k 500 -b 1024 -o /sicherung/pdfs -i /dev/sdb

Usually you will use an image of the disc instead of using the disc directly. Tools for this would be ddrescue and dd as i already explained.

Restore (Repair) archives – (gz) – gzrecovery

gzrecovery can be obtained from here, in case it’s not in your distribution.

Let’s imagine with the above tools we got some .gz files containing a lot of files. These .gz files are broken because some deleted files got overwritten on the disc already or something else happened or the backup tool wasn’t able to restore them fully. We can try to repair this archive using gzrecovery to get at least some of the files within that .gz back. Look at the documentation of gzrecovery to do so.

Other useful things and links

Should be enough to give you help.

Conclusion?

I was really impressed by the fact that it was THAT easy (*cough*) to restore most of my data. I expected some black magic or something. Anyway, I used 5 old harddiscs to restore my data and I got nearly everything back. I wouldn’t recommend to use your old harddiscs as backups though.. happily I didn’t trash them 🙂

In addition this shows that even if you „quickformat“ or create a linux filesystem or if you make new partitions, people could restore that data. Thats .. Evil. By the way.. Just to name some tools to securely erase your harddisc, you could try:

dd of=/dev/hda <<< "HERE-IS-NOTHING-TO-ReStOrE"

Of course you need to replace hda with the disc to clean.

though… not sure how „secure“ this would be :p Another way to do this is using „shred“. Just google a bit for it. It will first overwrite files to hide it’s content.

No Comments

Post a Comment