Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - several ETFS problems: (4 Items)
   
several ETFS problems  
Hi,

we are using ETFS on a 2K Nand flash in several of our products.

Quite some problems have shown up, and I'd be happy to get any feedback on what's happening here.

1. we seem to loose files / have corrupt spare areas...
We copied several files into the ETFS filesystem. After some reboots (without modifying the files in the mean time... we
 usually shut down cleanly with "shutdown") ETFS reported corruption when starting/mounting it, and some of the files 
vanished.

Starting ETFS driver for NAND flash...
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474176
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474177
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474178
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474179
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474180
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474181
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474182
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474183
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474184
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474185
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474186
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474187
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474188
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474189
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474190
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474191
fs-etfs-mpc8313erdb512-nand512: Truncating fid 0x18 from 16384 to 0 bytes
Starting ETFS success! 

According to the source code, "readtrans  DATAERR" means that the spare area is corrupted (damaged data, detected by 
means of CRC errors).

2. I have tried to copy a file from ETFS to ETFS (i.e. create a duplicate under a different name). ETFS complained right
 away that the file wouldn't fit into the partition (which is correct). After that, I got "cannot fork" when trying to 
start additional processes via the shell. However, according to Momentics/qconn there was no unusual amount of processes
, so it can't be the process table being full...
This could only be fixed be rebooting.
Interestingly enough, after rebooting and trying the same command agin, ETFS did _NOT_ complain that the file wouldn't 
fit into the partition and started copying happily.

3. I started copying the file as described above (copying to filename "x"), canceled the copy process (Ctrl-C) and 
deleted the created file using "rm x". Right after rm exited, I powered down and rebootet. Upon booting/mounting, ETFS 
complained about "readtrans DATAERR" on several clusters and reported a fid as truncated to 0. When looking into the /fs
/etfs dir, the "x" file was present again - with a file size of 0. How can that be? I thought file system actions were 
handled as transactions? Either the file is deleted or it is not? Also I'm wondering why ETFS has a problem when 
powering down (without shutdown) after "rm" has exited... as far as I understand from the ETFS descriptions, ETFS has no
 independently running "background worker" thread but does everything "inline" with normal requests... so "rm" returning
 should mean that the transaction has been finished and the file deleted correctly?


On a different note, I'm still a bit concerned about no ECC being used on the spare area. I had alread enquired about 
this some time ago but got no real reply and forgot about it since. Now I've had my nose poked into this issue again due
 to the above problems... Is this lack of ECC on the spare area a problem? Why/why not? Are there any plans on fixing it
 in case it is a problem?

Greetings,
 Marc
Re: several ETFS problems  
copy of what I just put on "Anything new" thread....

Marc-

Check out my hack on topic 6546.  We had some boards failing to boot because of spare area read errors in the raw 
partition.  Implementing this logic in our IPL resurrected a bunch of boards.  I also use this hack in my formatted 
partition as well.  The performance hit is not so bad because you check the transaction crc first and skip the hack if 
all is OK.

QNX PR# 65567 will address this issue in the future (6.4.1).  I used our priority support to help get this investigated.



http://community.qnx.com/sf/discussion/do/listPosts/projects.filesystems/discussion.general.topc6546

Re: several ETFS problems  
Marc-

Sorry, previous references were to NAND 2048.  Here is the hack for NAND 512.

The following code snippet from devio.c devio_readcluster() will prevent a DATAERR from occurring in the case of a 
single bit releasing from a 0 to a 1 in the spare area:



	else if(dev->crc16((uint8_t *) sp1, sizeof(*sp1) - sizeof(sp1->crctrans)) != sp1->crctrans) {
	  // try brute force contingency of walking a 0 through
	  // since failure mode is a single bit releasing from
	  // 0 to 1
	  int i,j;
	  uint32_t mask;
    uint32_t *ptr32 = (uint32_t*) sp1;
	  //iterate through spare area 32bits at a time
	  for( j=0 ; j < sizeof(struct spare1)/sizeof(uint32_t) ; j++, ptr32++ ){
		  for( i=0 ; i<8*sizeof(uint32_t) ; i++ ){
		    mask = 1 << i;
		    //only need to try making 1's a 0
		    if( ( *ptr32 & mask) == mask ){
		      *ptr32 = *ptr32 & ~mask;
		      //retry crc of spare
        	if(dev->crc16((uint8_t *) sp1, sizeof(*sp1) - sizeof(sp1->crctrans)) == sp1->crctrans) {
        	  // error in spare found
        		dev->log(_SLOG_ERROR, "readcluster trans DATAERR FORCED CORRECTION on cluster %d", cluster);
        		trp->tacode = ETFS_TRANS_OK;
        	  goto spare_area_512_corrected;
        	}
        	else{
        	  //this wasn't the bit in error so undo the change
			      *ptr32 = *ptr32 | mask;
        	}
		    }
		  } 
	  }
	  // falling through means we were not able to correct the problem 
		dev->log(_SLOG_ERROR, "readcluster trans DATAERR on cluster %d", cluster);
		trp->tacode = ETFS_TRANS_DATAERR;

	} else
		trp->tacode = ETFS_TRANS_OK;

	spare_area_512_corrected:
Re: several ETFS problems  
Thanks David! Already saw it there..

It actually is 2K NAND, it's just that we haven't changed the name of the project yet from the original 512-Byte-NAND 
ETFS devio layer...