Jump to content



Photo

Strip text from log files contained within recursive tar


  • Please log in to reply
8 replies to this topic

#1 ]SK[

]SK[

    Neowinian Senior

  • Tech Issues Solved: 2
  • Joined: 12-October 04
  • Location: Nottingham, UK
  • OS: Windows 8.1
  • Phone: Nexus 5

Posted 13 December 2013 - 11:52

Hi all,

 

I work with a product that dumps logs to various tar files then puts them in another tar file.  These logs are requested frequently by support personnel however we cannot pass on these logs as they contain IP addresses which as a secure site we cannot allow a 3rd party to see. Does anyone know of a way to open recurves tar files, strip out any IP address it finds in the logs, tar the stripepd log and move onto the next tar?

This is actually for Windows but since tar is most commonly associated with Unix I thought you guys would be able to advise.  In Windows we currently use a tool provided by EMC however it's old and crashes at certain stages.  We have to untar the files manually and run the tool against the extract tar files.

 

Thanks




#2 Haggis

Haggis

    Neowinian Senior

  • Tech Issues Solved: 12
  • Joined: 13-June 07
  • Location: Near Stirling, Scotland
  • OS: Debian 7
  • Phone: Samsung Galaxy S3 LTE (i9305)

Posted 13 December 2013 - 12:00

Let me just check i am reading right :)

 

 

So you have one big tar file

 

with lots of other tar files in it

 

you want to extract the tar file

 

search the file for ip addresses and remove them

 

re-tar the small file and then re-tar as a big file

 

correct?

 

 

How are the tar files named? and how are the log files named?

 

you can use grep and regex to find the IP addresses (i am at work and only have access to windows machines so this has not been tested)

grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ip.txt


#3 Haggis

Haggis

    Neowinian Senior

  • Tech Issues Solved: 12
  • Joined: 13-June 07
  • Location: Near Stirling, Scotland
  • OS: Debian 7
  • Phone: Samsung Galaxy S3 LTE (i9305)

Posted 13 December 2013 - 14:01

you should also be able to use sed to find and replace

 

 

sed 's/\([0-9]\{1,3\}\.\)\{3,3\}[0-9]\{1,3\}/x.x.x.x/g' ip.txt > out.txt
 

 

this will take ip.txt

blah
192.168.0.1

and create out.txt

blah
x.x.x.x


#4 Haggis

Haggis

    Neowinian Senior

  • Tech Issues Solved: 12
  • Joined: 13-June 07
  • Location: Near Stirling, Scotland
  • OS: Debian 7
  • Phone: Samsung Galaxy S3 LTE (i9305)

Posted 13 December 2013 - 19:34

still waiting on ur example of log file so i can do this



#5 thomastmc

thomastmc

    Unofficial Attorney of Neowin

  • Joined: 18-July 12
  • Location: Kansas City
  • OS: Windows 8.1 Pro
  • Phone: Lumia 928

Posted 13 December 2013 - 20:14

Hi all,

 

I work with a product that dumps logs to various tar files then puts them in another tar file.  These logs are requested frequently by support personnel however we cannot pass on these logs as they contain IP addresses which as a secure site we cannot allow a 3rd party to see. Does anyone know of a way to open recurves tar files, strip out any IP address it finds in the logs, tar the stripepd log and move onto the next tar?

 

This is actually for Windows but since tar is most commonly associated with Unix I thought you guys would be able to advise.  In Windows we currently use a tool provided by EMC however it's old and crashes at certain stages.  We have to untar the files manually and run the tool against the extract tar files.

 

Thanks

 

It wouldn't be hard at all to script that in Windows with batch, powershell, or vbs to extract the TARs and use a sed equivalent as @Haggis recommends to parse the log and simply replace any IP address with 0s.

 

Extracting the TAR manually and then parsing with the EMC tool is overly tedious, even if it was totally stable  :)

 

You do need to provide an example of at least the structure of the TAR archive and a log file to get a real workable solution though.



#6 OP ]SK[

]SK[

    Neowinian Senior

  • Tech Issues Solved: 2
  • Joined: 12-October 04
  • Location: Nottingham, UK
  • OS: Windows 8.1
  • Phone: Nexus 5

Posted 23 December 2013 - 12:04

First off, apologies for the late reply. Been busy and because of which forgot I wrote this.

 

 

 

Let me just check i am reading right :)

 

 

So you have one big tar file

 

with lots of other tar files in it

 

you want to extract the tar file

 

search the file for ip addresses and remove them

 

re-tar the small file and then re-tar as a big file

 

correct?

 

Correct

 

It wouldn't be hard at all to script that in Windows with batch, powershell, or vbs to extract the TARs and use a sed equivalent as @Haggis recommends to parse the log and simply replace any IP address with 0s.

 

Extracting the TAR manually and then parsing with the EMC tool is overly tedious, even if it was totally stable  :)

 

You do need to provide an example of at least the structure of the TAR archive and a log file to get a real workable solution though.

 

Ideally a Windows solution would be better. I have 7zip installed so can use 7z.exe to extract, I guess the Win32 Binary for Sed would need to be installed also.

Structure of the file...

 

The tar
> A folder
>> A tar.gz file
>>> A tar file
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Folders too. One containing a tgz file. 
>>> A tar file
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Folders too. One containing a tgz file. 
 
I can't give you an example file sadly for the very reason this thread exists :)
If we can replace the IP with 0's that would be great. I'm thinking maybe their log parsing tool would break if it was expecting numbers and found x's instead.
Is it possible to do this without having to know the structure? Just I found another log it outputs with a different structure.
 
The other file...
 
The tar
> A folder
>> A tar.gz file
>>> A tar file (about 10 of these with same structure)
>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). Two gz.tar files. A folder with logs.
>>> A tar file (about 10 of these with same structure)
>>>> Three folders
>>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub). More folders with gz.tar and logs.
>>> A gz.tar file
>>>> A tar file
>>>>> Lots of logs although their extension is random (out, leases, nothing, sh, log, cat, pid, xml, pub, ph0-7, atu, info, vtu, port, fru ). Three folders with other folders and logs. A folder with logs.


#7 OP ]SK[

]SK[

    Neowinian Senior

  • Tech Issues Solved: 2
  • Joined: 12-October 04
  • Location: Nottingham, UK
  • OS: Windows 8.1
  • Phone: Nexus 5

Posted 06 January 2014 - 11:29

Guess I scared everyone? :(



#8 Haggis

Haggis

    Neowinian Senior

  • Tech Issues Solved: 12
  • Joined: 13-June 07
  • Location: Near Stirling, Scotland
  • OS: Debian 7
  • Phone: Samsung Galaxy S3 LTE (i9305)

Posted 06 January 2014 - 20:40

Sorry i have just been really busy recently, i will look at this as soon as i can :)



#9 OP ]SK[

]SK[

    Neowinian Senior

  • Tech Issues Solved: 2
  • Joined: 12-October 04
  • Location: Nottingham, UK
  • OS: Windows 8.1
  • Phone: Nexus 5

Posted 14 January 2014 - 12:43

I kind of figured this would be harder than what was first thought.