MLUG: Re: [MLUG] String manipulation in C
Re: [MLUG] String manipulation in C
Email address obfuscation in effect -- please click here to turn it off.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
On Sun, 1 Jul 2007, Jack Smith wrote:

You hit the nail right on the head with what I need to do, Dr. Smith. My project is doing gene sequence to DNA probe mapping. I have a file with 600k lines of 5-50 base-pair (letter) probes and I need to see if there are sequences that are identical to the probes' sequences in the DNA sequences. The chromosomal DNA sequence fragments are roughly 500-1500 bp long and there are about 29k of them. I need to see any and all matches between the probes and the chromosomal DNA as well as where in that DNA sequence the match occurs. In short, I want something like this:

Probename  Sequence    Gene name   Match Start BP   Match End BP
Probe1     AAGGCC      Gene1       50               55
Probe1     AAGGCC      Gene1       95               100
Probe2     CCGACGT     Gene1
Probe3     [AG]CCT     Gene1       65               68

So the big job is essentially grepping 600,000 times through a file of 29,000 lines. That will probably never be all that fast of a job, but if you only need to do it once and you can dedicate a computer to it, you'll be done faster if you don't spend a lot of time on optimizing the code.


Mike

_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members