foundry27 : Post

Forum Topic - several ETFS problems: (4 Items)

View: as

Update

Expand All | Collapse All

Marc Roessler

04/17/2009 10:39 AM

post27353

several ETFS problems

Hi,

we are using ETFS on a 2K Nand flash in several of our products.

Quite some problems have shown up, and I'd be happy to get any feedback on what's happening here.

1. we seem to loose files / have corrupt spare areas...
We copied several files into the ETFS filesystem. After some reboots (without modifying the files in the mean time... we
usually shut down cleanly with "shutdown") ETFS reported corruption when starting/mounting it, and some of the files
vanished.

Starting ETFS driver for NAND flash...
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474176
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474177
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474178
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474179
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474180
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474181
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474182
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474183
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474184
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474185
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474186
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474187
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474188
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474189
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474190
fs-etfs-mpc8313erdb512-nand512: readtrans DATAERR on cluster 474191
fs-etfs-mpc8313erdb512-nand512: Truncating fid 0x18 from 16384 to 0 bytes
Starting ETFS success!

According to the source code, "readtrans DATAERR" means that the spare area is corrupted (damaged data, detected by
means of CRC errors).

2. I have tried to copy a file from ETFS to ETFS (i.e. create a duplicate under a different name). ETFS complained right
away that the file wouldn't fit into the partition (which is correct). After that, I got "cannot fork" when trying to
start additional processes via the shell. However, according to Momentics/qconn there was no unusual amount of processes
, so it can't be the process table being full...
This could only be fixed be rebooting.
Interestingly enough, after rebooting and trying the same command agin, ETFS did _NOT_ complain that the file wouldn't
fit into the partition and started copying happily.

3. I started copying the file as described above (copying to filename "x"), canceled the copy process (Ctrl-C) and
deleted the created file using "rm x". Right after rm exited, I powered down and rebootet. Upon booting/mounting, ETFS
complained about "readtrans DATAERR" on several clusters and reported a fid as truncated to 0. When looking into the /fs
/etfs dir, the "x" file was present again - with a file size of 0. How can that be? I thought file system actions were
handled as transactions? Either the file is deleted or it is not? Also I'm wondering why ETFS has a problem when
powering down (without shutdown) after "rm" has exited... as far as I understand from the ETFS descriptions, ETFS has no
independently running "background worker" thread but does everything "inline" with normal requests... so "rm" returning
should mean that the transaction has been finished and the file deleted correctly?

On a different note, I'm still a bit concerned about no ECC being used on the spare area. I had alread enquired about
this some time ago but got no real reply and forgot about it since. Now I've had my nose poked into this issue again due
to the above problems... Is this lack of ECC on the spare area a problem? Why/why not? Are there any plans on fixing it
in case it is a problem?

Greetings,
Marc

David Thompson(deleted)

Re: several ETFS problems

David Thompson(deleted)

04/17/2009 12:02 PM

post27375

Re: several ETFS problems

copy of what I just put on "Anything new" thread....

Marc-

Check out my hack on topic 6546.  We had some boards failing to boot because of spare area read errors in the raw 
partition.  Implementing this logic in our IPL resurrected a bunch of boards.  I also use this hack in my formatted 
partition as well.  The performance hit is not so bad because you check the transaction crc first and skip the hack if 
all is OK.

QNX PR# 65567 will address this issue in the future (6.4.1).  I used our priority support to help get this investigated.



http://community.qnx.com/sf/discussion/do/listPosts/projects.filesystems/discussion.general.topc6546

David Thompson(deleted)

Re: several ETFS problems

David Thompson(deleted)

04/17/2009 12:11 PM

post27377

Re: several ETFS problems

Marc-

Sorry, previous references were to NAND 2048.  Here is the hack for NAND 512.

The following code snippet from devio.c devio_readcluster() will prevent a DATAERR from occurring in the case of a 
single bit releasing from a 0 to a 1 in the spare area:



	else if(dev->crc16((uint8_t *) sp1, sizeof(*sp1) - sizeof(sp1->crctrans)) != sp1->crctrans) {
	  // try brute force contingency of walking a 0 through
	  // since failure mode is a single bit releasing from
	  // 0 to 1
	  int i,j;
	  uint32_t mask;
    uint32_t *ptr32 = (uint32_t*) sp1;
	  //iterate through spare area 32bits at a time
	  for( j=0 ; j < sizeof(struct spare1)/sizeof(uint32_t) ; j++, ptr32++ ){
		  for( i=0 ; i<8*sizeof(uint32_t) ; i++ ){
		    mask = 1 << i;
		    //only need to try making 1's a 0
		    if( ( *ptr32 & mask) == mask ){
		      *ptr32 = *ptr32 & ~mask;
		      //retry crc of spare
        	if(dev->crc16((uint8_t *) sp1, sizeof(*sp1) - sizeof(sp1->crctrans)) == sp1->crctrans) {
        	  // error in spare found
        		dev->log(_SLOG_ERROR, "readcluster trans DATAERR FORCED CORRECTION on cluster %d", cluster);
        		trp->tacode = ETFS_TRANS_OK;
        	  goto spare_area_512_corrected;
        	}
        	else{
        	  //this wasn't the bit in error so undo the change
			      *ptr32 = *ptr32 | mask;
        	}
		    }
		  } 
	  }
	  // falling through means we were not able to correct the problem 
		dev->log(_SLOG_ERROR, "readcluster trans DATAERR on cluster %d", cluster);
		trp->tacode = ETFS_TRANS_DATAERR;

	} else
		trp->tacode = ETFS_TRANS_OK;

	spare_area_512_corrected:

Marc Roessler

04/20/2009 4:01 AM

post27451

Re: several ETFS problems

Thanks David! Already saw it there..

It actually is 2K NAND, it's just that we haven't changed the name of the project yet from the original 512-Byte-NAND 
ETFS devio layer...

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page