|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Hello,
I'm working on a documentation migration project that involves translating a
group of html files into MediWiki markup. I'm using an application called
Pandoc to do the bulk of the translation, however there are some
application-centric tags that Pandoc refuses to translate. These app-centric
tags are essential for producing searchable indexices later on in the
process. My idea is to enclose these app-centric tags in html comments
notation, so that Pandoc will pass them (tags) on. As comments they remain
in the file but the person viewing the docs won't see them, I can later
programmatically access those tags and process them accordingly.
The tool I want to use is "sed" .... however I'm rusty and have been
struggling a bit.
This sample line example shows what I need to do:
<indexentry "This part is always different>
needs to look like this:
<!-- <indexentry "This part is always different"> -->
the wildcarding portion of my sed statement is where I'm having
difficulties.
sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html
gives me: <!-- <indexentry* --> "This part is always different">
Close but no cigar! It's not treating the "*" as a wildcard but passing it
on. I'm not escaping it properly am I? I've tried more than several
incantations but haven't had any luck. Someone's going to take one look at
this and solve it!
Thanks Jim
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Jim Holsenback escreveu:
> Hello,
>
> I'm working on a documentation migration project that involves translating a
> group of html files into MediWiki markup. I'm using an application called
> Pandoc to do the bulk of the translation, however there are some
> application-centric tags that Pandoc refuses to translate. These app-centric
> tags are essential for producing searchable indexices later on in the
> process. My idea is to enclose these app-centric tags in html comments
> notation, so that Pandoc will pass them (tags) on. As comments they remain
> in the file but the person viewing the docs won't see them, I can later
> programmatically access those tags and process them accordingly.
>
> The tool I want to use is "sed" .... however I'm rusty and have been
> struggling a bit.
>
> This sample line example shows what I need to do:
>
> <indexentry "This part is always different>
>
> needs to look like this:
>
> <!-- <indexentry "This part is always different"> -->
>
> the wildcarding portion of my sed statement is where I'm having
> difficulties.
>
> sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html
>
> gives me: <!-- <indexentry* --> "This part is always different">
>
> Close but no cigar! It's not treating the "*" as a wildcard but passing it
> on. I'm not escaping it properly am I? I've tried more than several
> incantations but haven't had any luck. Someone's going to take one look at
> this and solve it!
damn! I don't have sed available right now and it's been quite some
time since I've raged on perl regexes... :P
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"nemesis" <nam### [at] gmailcom> wrote in message
news:49218a97@news.povray.org...
> damn! I don't have sed available right now and it's been quite some time
> since I've raged on perl regexes... :P
I'm closer ....
with: sed s/'<indexentry '\(*\)*/'<-- <indexentry '/g
<indexentry "This is a test">
gives:
<-- <indexentry "This is a test">
time for a smoke .....
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Jim Holsenback" <jho### [at] hotmailcom> wrote:
> This sample line example shows what I need to do:
>
> <indexentry "This part is always different>
Should there be a quote at the end of that? I'll assume so.
> needs to look like this:
>
> <!-- <indexentry "This part is always different"> -->
> sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html
I think the problem is that you need to use \( \) and a \1 to correctly extract
that portion of the text. I have success with:
sed -e 's%<indexentry "\([^"]*\)">%<!-- <indexentry "\1"> -->%g'
To break that down, [^"]* finds everything up until the final quote, and the \(
\) around that lets you reference that later with \1. Then just type the
expression as you would like it to appear, with the \1 in place of the text.
Hope that helps.
- Ricky
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Jim Holsenback" <jho### [at] hotmailcom> wrote:
> "nemesis" <nam### [at] gmailcom> wrote in message
> news:49218a97@news.povray.org...
> > damn! I don't have sed available right now and it's been quite some time
> > since I've raged on perl regexes... :P
>
> I'm closer ....
>
> with: sed s/'<indexentry '\(*\)*/'<-- <indexentry '/g
> <indexentry "This is a test">
I think unless you use a matching \1, you will only be able to make changes UP
TO the quoted part.
Frustrating, but useful. Very useful. Don't give up:
http://xkcd.com/208/
- Ricky
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"triple_r" <nomail@nomail> wrote in message
news:web.4921949ae31b5110866b6f750@news.povray.org...
> I think the problem is that you need to use \( \) and a \1 to correctly
> extract
> that portion of the text. I have success with:
>
> sed -e 's%<indexentry "\([^"]*\)">%<!-- <indexentry "\1"> -->%g'
Yes ... this works! I'd given up flogging at this and decided to go back to
basics. I have a book that covers sed and awk so I did some reading and
decided that I was at least getting a match of sorts with my first stab at
it because of the results. The grouping with () and the refernce to \1 and
\2 (pattern/hold buffer) was something covered in the book. Your live
working example pulled it all togeather for me .... Thanks!
Jim
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Jim Holsenback wrote:
> sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html
Note that matches "<indexentr" followed by zero or more 'y' characters :)
The asterisk is not a wildcard. It's a modifier for the previous character,
making it match 0-infinite times.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |