POV-Ray: Newsgroups: povray.off-topic: sed question

POV-Ray : Newsgroups : povray.off-topic : sed question		Server Time 6 Sep 2024 19:23:03 EDT (-0400)

From: Jim Holsenback
Subject: sed question
Date: 17 Nov 2008 09:34:20
Message: <492180ec@news.povray.org>

Hello,

I'm working on a documentation migration project that involves translating a 
group of html files into MediWiki markup. I'm using an application called 
Pandoc to do the bulk of the translation, however there are some 
application-centric tags that Pandoc refuses to translate. These app-centric 
tags are essential for producing searchable indexices later on in the 
process. My idea is to enclose these app-centric tags in html comments 
notation, so that Pandoc will pass them (tags) on. As comments they remain 
in the file but the person viewing the docs won't see them, I can later 
programmatically access those tags and process them accordingly.

 The tool I want to use is "sed" .... however I'm rusty and have been 
struggling a bit.

 This sample line example shows what I need to do:

<indexentry "This part is always different>

 needs to look like this:

<!-- <indexentry "This part is always different"> -->

 the wildcarding portion of my sed statement is where I'm having 
difficulties.

 sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html

 gives me: <!-- <indexentry* --> "This part is always different">

Close but no cigar! It's not treating the "*" as a wildcard but passing it 
on. I'm not escaping it properly am I? I've tried more than several 
incantations but haven't had any luck. Someone's going to take one look at 
this and solve it!

Thanks Jim

Post a reply to this message

From: nemesis
Subject: Re: sed question
Date: 17 Nov 2008 10:15:35
Message: <49218a97@news.povray.org>

Jim Holsenback escreveu:
> Hello,
> 
> I'm working on a documentation migration project that involves translating a 
> group of html files into MediWiki markup. I'm using an application called 
> Pandoc to do the bulk of the translation, however there are some 
> application-centric tags that Pandoc refuses to translate. These app-centric 
> tags are essential for producing searchable indexices later on in the 
> process. My idea is to enclose these app-centric tags in html comments 
> notation, so that Pandoc will pass them (tags) on. As comments they remain 
> in the file but the person viewing the docs won't see them, I can later 
> programmatically access those tags and process them accordingly.
> 
>  The tool I want to use is "sed" .... however I'm rusty and have been 
> struggling a bit.
> 
>  This sample line example shows what I need to do:
> 
> <indexentry "This part is always different>
> 
>  needs to look like this:
> 
> <!-- <indexentry "This part is always different"> -->
> 
>  the wildcarding portion of my sed statement is where I'm having 
> difficulties.
> 
>  sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html
> 
>  gives me: <!-- <indexentry* --> "This part is always different">
> 
> Close but no cigar! It's not treating the "*" as a wildcard but passing it 
> on. I'm not escaping it properly am I? I've tried more than several 
> incantations but haven't had any luck. Someone's going to take one look at 
> this and solve it!

damn!  I don't have sed available right now and it's been quite some 
time since I've raged on perl regexes... :P

Post a reply to this message

From: Jim Holsenback
Subject: Re: sed question
Date: 17 Nov 2008 10:57:48
Message: <4921947c@news.povray.org>

"nemesis" <nam### [at] gmailcom> wrote in message 
news:49218a97@news.povray.org...
> damn!  I don't have sed available right now and it's been quite some time 
> since I've raged on perl regexes... :P

I'm closer ....

with: sed s/'<indexentry '\(*\)*/'<-- <indexentry '/g
<indexentry "This is a test">

gives:

<-- <indexentry "This is a test">

time for a smoke .....

Post a reply to this message

From: triple r
Subject: Re: sed question
Date: 17 Nov 2008 11:00:01
Message: <web.4921949ae31b5110866b6f750@news.povray.org>

"Jim Holsenback" <jho### [at] hotmailcom> wrote:

>  This sample line example shows what I need to do:
>
> <indexentry "This part is always different>

Should there be a quote at the end of that?  I'll assume so.

> needs to look like this:
>
> <!-- <indexentry "This part is always different"> -->
>  sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html

I think the problem is that you need to use \( \) and a \1 to correctly extract
that portion of the text.  I have success with:

sed -e 's%<indexentry "\([^"]*\)">%<!-- <indexentry "\1"> -->%g'

To break that down, [^"]* finds everything up until the final quote, and the \(
\) around that lets you reference that later with \1.  Then just type the
expression as you would like it to appear, with the \1 in place of the text.
Hope that helps.

 - Ricky

Post a reply to this message

From: triple r
Subject: Re: sed question
Date: 17 Nov 2008 11:10:00
Message: <web.49219626e31b5110866b6f750@news.povray.org>

"Jim Holsenback" <jho### [at] hotmailcom> wrote:
> "nemesis" <nam### [at] gmailcom> wrote in message
> news:49218a97@news.povray.org...
> > damn!  I don't have sed available right now and it's been quite some time
> > since I've raged on perl regexes... :P
>
> I'm closer ....
>
> with: sed s/'<indexentry '\(*\)*/'<-- <indexentry '/g
> <indexentry "This is a test">

I think unless you use a matching \1, you will only be able to make changes UP
TO the quoted part.

Frustrating, but useful.  Very useful.  Don't give up:

http://xkcd.com/208/

 - Ricky

Post a reply to this message

From: Jim Holsenback
Subject: Re: sed question
Date: 17 Nov 2008 12:21:00
Message: <4921a7fc@news.povray.org>

"triple_r" <nomail@nomail> wrote in message 
news:web.4921949ae31b5110866b6f750@news.povray.org...
> I think the problem is that you need to use \( \) and a \1 to correctly 
> extract
> that portion of the text.  I have success with:
>
> sed -e 's%<indexentry "\([^"]*\)">%<!-- <indexentry "\1"> -->%g'

Yes ... this works! I'd given up flogging at this and decided to go back to 
basics. I have a book that covers sed and awk so I did some reading and 
decided that I was at least getting a match of sorts with my first stab at 
it because of the results. The grouping with () and the refernce to \1 and 
\2 (pattern/hold buffer) was something covered in the book. Your live 
working example pulled it all togeather for me .... Thanks!

Jim

Post a reply to this message

From: Nicolas Alvarez
Subject: Re: sed question
Date: 17 Nov 2008 13:00:23
Message: <4921b136@news.povray.org>

Jim Holsenback wrote:
>  sed s\%'<indexentry*'%'<!-- <indexentry* -->'%g test.html

Note that matches "<indexentr" followed by zero or more 'y' characters :)
The asterisk is not a wildcard. It's a modifier for the previous character,
making it match 0-infinite times.

Post a reply to this message