Strip text from log files contained within recursive tar


Recommended Posts

Hi all,

 

I work with a product that dumps logs to various tar files then puts them in another tar file.  These logs are requested frequently by support personnel however we cannot pass on these logs as they contain IP addresses which as a secure site we cannot allow a 3rd party to see. Does anyone know of a way to open recurves tar files, strip out any IP address it finds in the logs, tar the stripepd log and move onto the next tar?

This is actually for Windows but since tar is most commonly associated with Unix I thought you guys would be able to advise.  In Windows we currently use a tool provided by EMC however it's old and crashes at certain stages.  We have to untar the files manually and run the tool against the extract tar files.

 

Thanks

Link to comment
Share on other sites

Let me just check i am reading right :)

 

 

So you have one big tar file

 

with lots of other tar files in it

 

you want to extract the tar file

 

search the file for ip addresses and remove them

 

re-tar the small file and then re-tar as a big file

 

correct?

 

 

How are the tar files named? and how are the log files named?

 

you can use grep and regex to find the IP addresses (i am at work and only have access to windows machines so this has not been tested)

grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ip.txt
Link to comment
Share on other sites

you should also be able to use sed to find and replace

 

 

sed 's/\([0-9]\{1,3\}\.\)\{3,3\}[0-9]\{1,3\}/x.x.x.x/g' ip.txt > out.txt
 

 

this will take ip.txt

blah
192.168.0.1

and create out.txt

blah
x.x.x.x
Link to comment
Share on other sites

Hi all,

 

I work with a product that dumps logs to various tar files then puts them in another tar file.  These logs are requested frequently by support personnel however we cannot pass on these logs as they contain IP addresses which as a secure site we cannot allow a 3rd party to see. Does anyone know of a way to open recurves tar files, strip out any IP address it finds in the logs, tar the stripepd log and move onto the next tar?

 

This is actually for Windows but since tar is most commonly associated with Unix I thought you guys would be able to advise.  In Windows we currently use a tool provided by EMC however it's old and crashes at certain stages.  We have to untar the files manually and run the tool against the extract tar files.

 

Thanks

 

It wouldn't be hard at all to script that in Windows with batch, powershell, or vbs to extract the TARs and use a sed equivalent as @Haggis recommends to parse the log and simply replace any IP address with 0s.

 

Extracting the TAR manually and then parsing with the EMC tool is overly tedious, even if it was totally stable  :)

 

You do need to provide an example of at least the structure of the TAR archive and a log file to get a real workable solution though.

Link to comment
Share on other sites

  • 2 weeks later...

First off, apologies for the late reply. Been busy and because of which forgot I wrote this.

 

 

 

Let me just check i am reading right :)

 

 

So you have one big tar file

 

with lots of other tar files in it

 

you want to extract the tar file

 

search the file for ip addresses and remove them

 

re-tar the small file and then re-tar as a big file

 

correct?

 

Correct

 

It wouldn't be hard at all to script that in Windows with batch, powershell, or vbs to extract the TARs and use a sed equivalent as @Haggis recommends to parse the log and simply replace any IP address with 0s.

 

Extracting the TAR manually and then parsing with the EMC tool is overly tedious, even if it was totally stable  :)

 

You do need to provide an example of at least the structure of the TAR archive and a log file to get a real workable solution though.

 

Ideally a Windows solution would be better. I have 7zip installed so can use 7z.exe to extract, I guess the Win32 Binary for Sed would need to be installed also.

Structure of the file...

 

The tar
> A folder
>> A tar.gz file
>>> A tar file
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Folders too. One containing a tgz file. 
>>> A tar file
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Folders too. One containing a tgz file. 
 
I can't give you an example file sadly for the very reason this thread exists :)
If we can replace the IP with 0's that would be great. I'm thinking maybe their log parsing tool would break if it was expecting numbers and found x's instead.
Is it possible to do this without having to know the structure? Just I found another log it outputs with a different structure.
 
The other file...
 
The tar
> A folder
>> A tar.gz file
>>> A tar file (about 10 of these with same structure)
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Two gz.tar files. A folder with logs.
>>> A tar file (about 10 of these with same structure)
>>>> Three folders
>>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). More folders with gz.tar and logs.
>>> A gz.tar file
>>>> A tar file
>>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub, ph0-7, atu, info, vtu, port, fru ). Three folders with other folders and logs. A folder with logs.
Link to comment
Share on other sites

  • 2 weeks later...

Sorry i have just been really busy recently, i will look at this as soon as i can :)

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.