Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On Sun, 1 Jul 2007, Jack Smith wrote:
You hit the nail right on the head with what I need to do, Dr. Smith. My
project is doing gene sequence to DNA probe mapping. I have a file with
600k lines of 5-50 base-pair (letter) probes and I need to see if there
are sequences that are identical to the probes' sequences in the DNA
sequences. The chromosomal DNA sequence fragments are roughly 500-1500
bp long and there are about 29k of them. I need to see any and all
matches between the probes and the chromosomal DNA as well as where in
that DNA sequence the match occurs. In short, I want something like
this:
Probename Sequence Gene name Match Start BP Match End BP
Probe1 AAGGCC Gene1 50 55
Probe1 AAGGCC Gene1 95 100
Probe2 CCGACGT Gene1
Probe3 [AG]CCT Gene1 65 68
So the big job is essentially grepping 600,000 times through a file of
29,000 lines. That will probably never be all that fast of a job, but if
you only need to do it once and you can dedicate a computer to it, you'll
be done faster if you don't spend a lot of time on optimizing the code.
Mike
_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members