bash - Replacing VCG1 or VCG2 with VCG* in perl script -
with of jaypal in previous question (https://stackoverflow.com/a/25735444/3767980) able format restraints both ambigous , unambigous cases. let's consider ambiguous here more difficult.
i have restraints like
g6n-d5c-?: (116.663, 177.052, 29.149) k87cd/e85cb/e94cb/h32cb/q21cb l12n-t11c-?: (128.977, 175.109, 174.412) k158c/h60c/a152c/n127c/y159c(noth60c) k14n-e13c-?: (117.377, 176.474, 29.823) i187cg1/v78cg2 a75n-q74c-?: (123.129, 177.253, 23.513) v131cg1/v135cg1/v78cg1
and subjected following perl script:
#!/usr/bin/perl use strict; use warnings; use autodie; # open $fh, '<', $argv[0]; while (<$fh>) { @values = map { /.(\d+)(\w+)/; $1, $2 } split '/', (split)[-1]; ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/; print "assign (resid $resid , name $name ) ("; print join ( " or ", map { "resid $values[$_] , name $values[$_ + 1]" } grep { not $_ % 2 } 0 .. $#values ); print " ) 3.5 2.5 4.5 ! $_"; }
with output:
assign (resid 5 , name c ) (resid 87 , name cd or resid 85 , name cb or resid 94 , name cb or resid 32 , name cb or resid 21 , name cb ) 3.5 2.5 8.5 ! g6n-d5c-?: (116.663, 177.052, 29.149) k87cd/e85cb/e94cb/h32cb/q21cb assign (resid 11 , name c ) (resid 158 , name c or resid 60 , name c or resid 152 , name c or resid 127 , name c or resid 159 , name c ) 3.5 2.5 8.5 ! l12n-t11c-?: (128.977, 175.109, 174.412) k158c/h60c/a152c/n127c/y159c(noth60c) assign (resid 13 , name c ) (resid 187 , name cg1 or resid 78 , name cg2 ) 3.5 2.5 8.5 ! k14n-e13c-?: (117.377, 176.474, 29.823) i187cg1/v78cg2 assign (resid 74 , name c ) (resid 131 , name cg1 or resid 135 , name cg2 or resid 78 , name cg1 ) 3.5 2.5 8.5 ! a75n-q74c-?: (123.129, 177.253, 23.513) v131cg1/v135cg1/v78cg1
- what need lines containing entries begin
v
followed 2 or 3 digits ,cg1
orcg2
after!
. examples v78cg2 or v135cg1. - i need restraints corresponding entries treated wildcard. need restraints returned like:
assign (resid 5 , name c ) (resid 87 , name cd or resid 85 , name cb or resid 94 , name cb or resid 32 , name cb or resid 21 , name cb ) 3.5 2.5 8.5 ! g6n-d5c-?: (116.663, 177.052, 29.149) k87cd/e85cb/e94cb/h32cb/q21cb assign (resid 11 , name c ) (resid 158 , name c or resid 60 , name c or resid 152 , name c or resid 127 , name c or resid 159 , name c ) 3.5 2.5 8.5 ! l12n-t11c-?: (128.977, 175.109, 174.412) k158c/h60c/a152c/n127c/y159c(noth60c) assign (resid 13 , name c ) (resid 187 , name cg1 or resid 78 , name cg* ) 3.5 2.5 8.5 ! k14n-e13c-?: (117.377, 176.474, 29.823) i187cg1/v78cg2 assign (resid 74 , name c ) (resid 131 , name cg* or resid 135 , name cg* or resid 78 , name cg* ) 3.5 2.5 8.5 ! a75n-q74c-?: (123.129, 177.253, 23.513) v131cg1/v135cg1/v78cg1
i need advice selecting matching lines , applying applied transfomation cluster input (before !
). can find lines match basic regex of v.*cg[1-2]
.
i solution in above perl script.
if unclear, please comment. still new. thank in advance advice.
here modified version of script explanation of going on. my @values = map { ... } split '/', (split)[-1];
little tricky understand, i'll explain separately:
map
takes array , applies whatever within braces every member of array, , outputs new array. 2 split
s used chop line. if used without arguments, split
takes $_
input , splits on whitespace. therefore, first split
takes $_
, current line, , splits spaces:
input: 'g6n-d5c-?: (116.663, 177.052, 29.149) k87cd/e85cb/e94cb/h32cb/q21cb' array created calling split: 'g6n-d5c-?:', '(116.663,', '177.052,', '29.149)', 'k87cd/e85cb/e94cb/h32cb/q21cb'
the second split
chops input on /
; input, uses last item in array created first split
-- i.e. (split)
shorthand "array created splitting $_
on whitespace", , (split)[-1]
last element of array.
input: k87cd/e85cb/e94cb/h32cb/q21cb array created calling `split "/"` 'k87cd', 'e85cb', 'e94cb', 'h32cb', 'q21cb'
the map command applies regex every member of array:
/.(\d+)(\w+)/; # match character (.) followed 1 or more digits (\d) # followed 1 or more alphanumeric (\w) characters.
the brackets capture results read-only variables $1
, $2
. second statement in map adds characters array being created map
command. default, perl puts result of last statement array, this:
my @arr = (1, 2, 3, 4); @two_times = map { $_ * 2 } @arr; # @two_times (2, 4, 6, 8)
(the "results" of pattern match $1 , $2, statement $1, $2
add them @values
array not strictly necessary.)
so @values = map { /.(\d+)(\w+)/; $1, $2 } @array
captures matches each element in @array
, puts them in @values
.
i hope rest of script understandable; if not, recommend taking apart each command , using data::dumper
examine results can work out going on.
to alter script treat vnncg1 / vnncg2 entries differently, added line map
command finds residue matches pattern , replaces vnncg*
. altered matching regex grab appropriate pieces of residue name not grab inappropriate data (such (notb28dg)
). here new script comments:
#!/usr/bin/perl use strict; use warnings; use feature ':5.10'; use autodie; open $fh, '<', $argv[0]; while (<$fh>) { # brief guide regexps: # \d = digits # \w = digits or letters or _ # [ ] = match of characters within these brackets # ( ) = capture value in these brackets, save $1, $2, $3, etc. # (brackets used alternation, not in case) # * = match 0 or 1 times # + = match 1 or more times # \* = match character * # s/ / / = search , replace # /x = ignore whitespace @values = map { # find pattern s/v # v (\d+) # 1 or more digits; brackets mean capture value # , gets saved in $1 cg # cg [12] # either 1 or 2 /v$1cg*/x; #replace v $1 cg * # find pattern /. # character (\d+) # 1 or more digits; capture value in $1 ([a-z][\w\*]*) # letter followed 0 or more alphanum or * /x; # value captured in $2 # put $1 , $2 array we're building $1, $2 } split '/', (split)[-1]; ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/; # compose new string $str = "assign (resid $resid , name $name ) (" . join ( " or ", map { "resid $values[$_] , name $values[$_ + 1]" } grep { not $_ % 2 } 0 .. $#values ) . " ) 3.5 2.5 8.5 ! $_"; # "say" prints out string stderr , automatically adds carriage return $str; }
short version of 'core' script without comments:
foreach (@data) { @values = map { s/v(\d+)cg[12]/v$1cg*/; /.(\d+)([a-z][\w\*]*)/; } split '/', (split)[-1]; ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/; "assign (resid $resid , name $name ) (" . join ( " or ", map { "resid $values[$_] , name $values[$_ + 1]" } grep { not $_ % 2 } 0 .. $#values ) . " ) 3.5 2.5 8.5 ! $_"; }
Comments
Post a Comment